This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git


The following commit(s) were added to refs/heads/master by this push:
     new 4b1c72c  GH-534: Clarify versioning and V2 (#535)
4b1c72c is described below

commit 4b1c72c837bec5b792b2514f0057533030fcedf8
Author: emkornfield <[email protected]>
AuthorDate: Fri Dec 19 08:37:30 2025 -0800

    GH-534: Clarify versioning and V2 (#535)
    
    Clarify versioning and no restrictions on encodings.
    
    Co-authored-by: Andrew Lamb <[email protected]>
    Co-authored-by: Joris Van den Bossche <[email protected]>
    Co-authored-by: Fokko Driesprong <[email protected]>
    Co-authored-by: Antoine Pitrou <[email protected]>
---
 Encodings.md                   |  7 +++++--
 src/main/thrift/parquet.thrift | 16 ++++++++++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/Encodings.md b/Encodings.md
index 62b4eb9..e620e9a 100644
--- a/Encodings.md
+++ b/Encodings.md
@@ -22,6 +22,9 @@ Parquet encoding definitions
 
 This file contains the specification of all supported encodings.
 
+Unless otherwise stated in page or encoding documentation, any encoding can be
+used with any page type.
+
 <a name="PLAIN"></a>
 ### Plain: (PLAIN = 0)
 
@@ -59,8 +62,8 @@ Dictionary page format: the entries in the dictionary using 
the [plain](#PLAIN)
 Data page format: the bit width used to encode the entry ids stored as 1 byte 
(max bit width = 32),
 followed by the values encoded using RLE/Bit packed described above (with the 
given bit width).
 
-Using the PLAIN_DICTIONARY enum value is deprecated in the Parquet 2.0 
specification. Prefer using RLE_DICTIONARY
-in a data page and PLAIN in a dictionary page for Parquet 2.0+ files.
+Using the `PLAIN_DICTIONARY` enum value is deprecated, use `RLE_DICTIONARY`
+in a data page and `PLAIN` in a dictionary page for new Parquet files.
 
 <a name="RLE"></a>
 ### Run Length Encoding / Bit-Packing Hybrid (RLE = 3)
diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index e99c461..7ff9b9f 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -712,9 +712,14 @@ struct DictionaryPageHeader {
 }
 
 /**
- * New page format allowing reading levels without decompressing the data
+ * Alternate page format allowing reading levels without decompressing the data
  * Repetition and definition levels are uncompressed
  * The remaining section containing the data is compressed if is_compressed is 
true
+ *
+ * Implementation note - this header is not necessarily a strict improvement 
over
+ * `DataPageHeader` (in particular the original header might provide better 
compression 
+ * in some scenarios). Page indexes require pages to start and end at row 
boundaries,
+ * regardless of which page header is used.
  **/
 struct DataPageHeaderV2 {
   /** Number of values, including NULLs, in this data page. **/
@@ -1255,7 +1260,14 @@ union EncryptionAlgorithm {
  * Description for file metadata
  */
 struct FileMetaData {
-  /** Version of this file **/
+  /** Version of this file 
+    * 
+    * As of December 2025, there is no agreed upon consensus of what 
constitutes 
+    * version 2 of the file. For maximum compatibility with readers, writers 
should 
+    * always populate "1" for version. For maximum compatibility with writers, 
+    * readers should accept "1" and "2" interchangeably.  All other versions 
are 
+    * reserved for potential future use-cases.
+    */
   1: required i32 version
 
   /** Parquet schema for this file.  This schema contains metadata for all the 
columns.

Reply via email to