etseidl commented on code in PR #186:
URL: https://github.com/apache/parquet-site/pull/186#discussion_r3366684763


##########
content/en/docs/File Format/versions.md:
##########
@@ -0,0 +1,267 @@
+---
+title: "Parquet format versions"
+linkTitle: "Format versions"
+weight: 9
+---
+
+This page describes how features are added to the [Parquet format
+specification](https://github.com/apache/parquet-format) and how they affect
+reader and writer compatibility. See the
+[Implementation status](../implementationstatus/) page for which 
implementations
+(arrow, parquet-java, arrow-rs, etc.) support each feature.
+
+*Note*: If you find out-of-date information, please open an issue or pull 
request.
+
+## Feature compatibility
+
+The Parquet format spec [classifies changes] by their effect on reader and
+writer compatibility. Changes differ in their *forward* compatibility — whether
+an older reader can read files that use a newer feature.
+
+**Forward compatible** features remain **readable by older readers**, with a
+possibly degraded experience: some metadata may be missing or performance may
+suffer, but the reader does not fail. Examples:
+
+* **Bloom filters**: a reader that ignores them skips the pruning metadata but
+  still reads the data correctly.
+* **Logical type annotations** such as `VARIANT`: an older reader reads the
+  underlying physical column (e.g. `BYTE_ARRAY`) as raw bytes without applying
+  the logical type.
+
+**Forward incompatible** features make the data **unreadable** to older 
software.
+Examples:
+
+* **New encodings** (e.g. the `DELTA_*` encodings, `BYTE_STREAM_SPLIT`,
+  `RLE_DICTIONARY`): a reader that does not implement them cannot decode the
+  column values.
+* **Data Page V2 headers**: a reader that only understands `DataPageHeader`
+  cannot parse `DataPageHeaderV2` pages.
+
+[classifies changes]: 
https://github.com/apache/parquet-format/blob/master/CONTRIBUTING.md#compatibility-and-feature-enablement
+
+## `FileMetadata` version field
+
+Each Parquet file has a `version` field in the [`thrift FileMetadata`] that
+declares which features the file may use, and thus what a reader **must** 
support
+to read it.
+
+**Note**: Many writers set the version field to `1` even for files that use
+format 2.0 features, which has caused [confusion and interoperability
+issues][closing-out-2.0].
+
+## `parquet-format` release versions
+
+The Thrift definition is released independently of implementations such as
+parquet-java or arrow-rs, following the Apache release process and
+[semantic versioning]:
+
+1. The major version corresponds to the [`thrift FileMetadata`] `version` 
field.
+2. Minor releases (e.g. `2.10.0` to `2.11.0`) may add compatible
+   features, but never incompatible ones. The minor version is not recorded in 
the
+   file itself.
+
+## Adding new features
+
+New features are added by discussion and voting on the [parquet dev mailing 
list]
+(full process [here]). Once approved, a feature is added to the spec and ships 
in
+the next parquet-format release.
+
+[parquet dev mailing list]: 
https://lists.apache.org/[email protected]
+[semantic versioning]: https://semver.org/
+[`thrift FileMetadata`]: 
https://github.com/apache/parquet-format/blob/c42c2cb4ecfccb38153375e24b702a82fd763cc0/src/main/thrift/parquet.thrift#L1365-L1373
+[here]: 
https://github.com/apache/parquet-format/blob/master/CONTRIBUTING.md#additionschanges-to-the-format
+[closing-out-2.0]: 
https://lists.apache.org/thread/0bdyyb7qobrxx94x8v7t5z7g2ksnpyr2
+
+## Forward incompatible features by version
+
+Forward incompatible features and the format version each became available in:
+
+* **V1**: the original Parquet format (1.0).
+* **V2**: format version 2.0.
+
+| Feature | V1 | V2 | Released in | Source | Notes |
+| ------------------------------------------ | ---- | ---- | 
----------------------------- | --- | ------------------------- |
+| [BOOLEAN] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [INT32] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [INT64] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [INT96 (deprecated)] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [FLOAT] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [DOUBLE] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [BYTE_ARRAY] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [FIXED_LEN_BYTE_ARRAY] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [Data Page V1] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [Data Page V2] |  | ✅ | [2.0.0] | [1.0.0..2.0.0] |  |
+| [PLAIN] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [PLAIN_DICTIONARY] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [RLE] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [BIT_PACKED (deprecated)] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [RLE_DICTIONARY] |  | ✅ | [2.0.0] | [1.0.0..2.0.0] |  |
+| [DELTA_BINARY_PACKED] |  | ✅ | [2.0.0] | [1.0.0..2.0.0] |  |
+| [DELTA_LENGTH_BYTE_ARRAY] |  | ✅ | [2.0.0] | [1.0.0..2.0.0] |  |
+| [DELTA_BYTE_ARRAY] |  | ✅ | [2.0.0] | [1.0.0..2.0.0] |  |
+| [BYTE_STREAM_SPLIT] |  | ✅ | [2.8.0] | [2.7.0..2.8.0] | [Approved 
2019-12-03] |
+| [BYTE_STREAM_SPLIT<br/>(Additional Types)] |  | ✅ | [2.11.0] | 
[2.10.0..2.11.0] | [Approved 2024-03-18] |
+| [UNCOMPRESSED] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [SNAPPY] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [GZIP] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [LZO] | ✅ | ✅ | [1.0.0] | [1.0.0][tree-1.0.0] |  |
+| [BROTLI] |  | ✅ | [2.4.0] | [2.3.1..2.4.0] |  |
+| [LZ4 (deprecated)] |  | ✅ | [2.4.0] | [2.3.1..2.4.0] |  |
+| [LZ4_RAW] |  | ✅ | [2.9.0] | [2.8.0..2.9.0] |  |
+| [ZSTD] |  | ✅ | [2.4.0] | [2.3.1..2.4.0] |  |
+| [Modular encryption] |  | ✅ | [2.7.0] | [2.6.0..2.7.0] | [Approved 
2019-01-16] |
+
+
+> **Note:** Files with an [encrypted footer] use different magic bytes (`PARE`
+> instead of `PAR1`), making it clear to readers they must support modular
+> encryption to read the file; [plaintext footer] files use `PAR1` so legacy
+> readers can still read their unencrypted columns.
+
+## Forward compatible additions
+
+Older readers can read files that use these features but may not understand the
+new information.
+
+| Feature | Released in | Source | Notes                                       
              |
+| ------------------------------------------- | ----------------------------- 
| --- |-----------------------------------------------------------|
+| [xxHash-based bloom filters] | [2.7.0] | [2.6.0..2.7.0] | [Approved 
2019-09-09]                                     |
+| [Bloom filter length] | [2.10.0] | [2.9.0..2.10.0] |                         
                                  |
+| [Page index] | [2.4.0] | [2.3.1..2.4.0] |                                    
                       |
+| [Page CRC32 checksum] | [1.0.0] | [1.0.0][tree-1.0.0] |                      
                                     |
+| [Size statistics] | [2.10.0] | [2.9.0..2.10.0] | [Approved 2023-11-14]       
                              |
+| [Geospatial statistics] | [2.11.0] | [2.10.0..2.11.0] | [Approved 
2025-02-09]                                     |
+| [Binary protocol extensions] | [2.11.0] | [2.10.0..2.11.0] | [Approved 
2024-09-06]                                     |
+| [IEEE 754 total order and NaN counts] | not yet released | [#514] | 
[Approved 2026-05-26]                                     |
+| [LogicalType union] | [2.4.0] | [2.3.1..2.4.0] | Supersedes `ConvertedType` 
enum<br/>deprecated in [2.9.0] |

Review Comment:
   The deprecation just means no new values are added to the ConvertedType 
enum. Writers are still required to populate both ConvertedType and LogicalType 
(although the wording changes depending on which paragraph you read...one 
[place](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#metadata)
 says "should", two [paragraphs 
later](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#compatibility)
 it says "must").



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to