alamb commented on code in PR #15537: URL: https://github.com/apache/datafusion/pull/15537#discussion_r2042825468
########## datafusion/datasource-parquet/src/file_format.rs: ########## @@ -569,6 +582,46 @@ pub fn apply_file_schema_type_coercions( )) } +/// Coerces the file schema if the table schema uses a view type. Review Comment: this comment seems out of date ########## datafusion/common/src/config.rs: ########## @@ -459,6 +459,14 @@ config_namespace! { /// BLOB instead. pub binary_as_string: bool, default = false + /// (reading) If true, parquet reader will read columns of + /// physical type int96 as originating from a different resolution + /// than nanosecond. This is useful for reading data from systems like Spark + /// which stores microsecond resolution timestamps in an int96 allowing it + /// to write values with a larger date range than 64-bit timestamps with + /// nanosecond resolution. + pub coerce_int96: Option<String>, transform = str::to_lowercase, default = None Review Comment: I wonder if there is any usecase for int96 other than timestamps. Specifically, maybe we can simply always change the behavior and coerce int96 --> microseconds At the very least default the option to be enabled perhaps -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org