mbutrovich commented on code in PR #15537:
URL: https://github.com/apache/datafusion/pull/15537#discussion_r2042835936


##########
datafusion/common/src/config.rs:
##########
@@ -459,6 +459,14 @@ config_namespace! {
         /// BLOB instead.
         pub binary_as_string: bool, default = false
 
+        /// (reading) If true, parquet reader will read columns of
+        /// physical type int96 as originating from a different resolution
+        /// than nanosecond. This is useful for reading data from systems like 
Spark
+        /// which stores microsecond resolution timestamps in an int96 
allowing it
+        /// to write values with a larger date range than 64-bit timestamps 
with
+        /// nanosecond resolution.
+        pub coerce_int96: Option<String>, transform = str::to_lowercase, 
default = None

Review Comment:
   > I wonder if there is any usecase for int96 other than timestamps.
   
   Not as far as I know, but I don't think the (deprecated) int96 spec said 
that it _had_ to represent a timestamp. It's just where Spark, Hive, Impala, 
etc. ended up.
   
   > Specifically, maybe we can simply always change the behavior and coerce 
int96 --> microseconds
   > At the very least default the option to be enabled perhaps
   
   It's not clear to me if we should assume that an int96 originated from a 
system that treated the originating timestamp it as microseconds. While it's 
very _likely_ that it originated from one of those systems, I don't know how to 
treat the default in this case. Snowflake, for example, seems to use 
microseconds for its timestamps when dealing with Iceberg:
   
https://docs.snowflake.com/en/user-guide/tables-iceberg-data-types#supported-data-types-for-iceberg-tables
   
   I'm hesitant to mess with defaults, but an open to hearing more from the 
community. @parthchandra 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to