Re: [PR] Add coerce int96 option for Parquet to support different TimeUnits, test int96_from_spark.parquet from parquet-testing [datafusion]

via GitHub Mon, 14 Apr 2025 14:27:43 -0700


parthchandra commented on code in PR #15537:
URL: https://github.com/apache/datafusion/pull/15537#discussion_r2042968195



##########
datafusion/common/src/config.rs:
##########
@@ -459,6 +459,14 @@ config_namespace! {
         /// BLOB instead.
         pub binary_as_string: bool, default = false
 
+        /// (reading) If true, parquet reader will read columns of
+        /// physical type int96 as originating from a different resolution
+        /// than nanosecond. This is useful for reading data from systems like 
Spark
+        /// which stores microsecond resolution timestamps in an int96 
allowing it
+        /// to write values with a larger date range than 64-bit timestamps 
with
+        /// nanosecond resolution.
+        pub coerce_int96: Option<String>, transform = str::to_lowercase, 
default = None

Review Comment:
   IIRC, the use of int96 originated from Impala/Parquet-cpp where it was used 
to store nanoseconds (The C++ implementation came from the Impala team). I  
think the Java implementation ended up with int96 in order to be compatible. 
Spark came along with its own variant and well, here we are. 
   (https://issues.apache.org/jira/browse/PARQUET-323)
   The Parquet community assumed that this was the only usage of int96 before 
it was deprecated so I feel it is a safe for us to assume the same. 
   It can be done as a follow up, though, I feel.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Add coerce int96 option for Parquet to support different TimeUnits, test int96_from_spark.parquet from parquet-testing [datafusion]

Reply via email to