This is an automated email from the ASF dual-hosted git repository.
gangwu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git
The following commit(s) were added to refs/heads/master by this push:
new 300b018 Clarify Int96 status and add recommended ordering (#504)
300b018 is described below
commit 300b018e0beffb4c335b4a0d7763d9edc1f3cd06
Author: Andrew Lamb <[email protected]>
AuthorDate: Sat Aug 23 23:03:55 2025 -0700
Clarify Int96 status and add recommended ordering (#504)
---
src/main/thrift/parquet.thrift | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index 59ec5f1..e99c461 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -33,7 +33,7 @@ enum Type {
BOOLEAN = 0;
INT32 = 1;
INT64 = 2;
- INT96 = 3; // deprecated, only used by legacy implementations.
+ INT96 = 3; // deprecated, new Parquet writers should not write data in INT96
FLOAT = 4;
DOUBLE = 5;
BYTE_ARRAY = 6;
@@ -1076,12 +1076,21 @@ union ColumnOrder {
* BOOLEAN - false, true
* INT32 - signed comparison
* INT64 - signed comparison
- * INT96 (only used for legacy timestamps) - undefined
+ * INT96 (only used for legacy timestamps) - undefined(+)
* FLOAT - signed comparison of the represented value (*)
* DOUBLE - signed comparison of the represented value (*)
* BYTE_ARRAY - unsigned byte-wise comparison
* FIXED_LEN_BYTE_ARRAY - unsigned byte-wise comparison
*
+ * (+) While the INT96 type has been deprecated, at the time of writing it is
+ * still used in many legacy systems. If a Parquet implementation chooses
+ * to write statistics for INT96 columns, it is recommended to order them
+ * according to the legacy rules:
+ * - compare the last 4 bytes (days) as a little-endian 32-bit signed
integer
+ * - if equal last 4 bytes, compare the first 8 bytes as a little-endian
+ * 64-bit signed integer (nanos)
+ * See https://github.com/apache/parquet-format/issues/502 for more
details
+ *
* (*) Because the sorting order is not specified properly for floating
* point values (relations vs. total ordering) the following
* compatibility rules should be applied when reading statistics: