mbutrovich opened a new pull request, #73: URL: https://github.com/apache/parquet-testing/pull/73
We are adding Spark-compatible int96 support to [DataFusion Comet](https://github.com/apache/datafusion-comet) when using arrow-rs's Parquet reader. To achieve this, we first added support for [arrow-rs to read int96 at different resolutions than nanosecond](https://github.com/apache/arrow-rs/pull/7285). It would previously generate nulls for non-null values. Next, we will add support to DataFusion to generate the necessary schema for arrow-rs to read int96 at the resolution that Spark expects. Finally, we will connect everything together in DataFusion Comet for accelerated Parquet reading with int96 values. We would like to test compatibility in all of these projects, and DataFusion and arrow-rs rely on this repo for Parquet files to test against. Please see the included markdown file for the details of the file. Please let me know if you think it would be helpful to mention that this type is now deprecated, and we are merely offering it for systems that want to maintain compatibility with Spark (which still defaults to writing this type for timestamps). **Additional context (taken from https://github.com/apache/arrow-rs/issues/7220)** <!-- Add any other context or screenshots about the feature request here. --> - Please see https://github.com/apache/datafusion/issues/7958 for relevant discussion from 2023. - Interpreting INT96 as a timestamp can be tough: it depends on the [Spark config](https://spark.apache.org/docs/latest/configuration.html), the [Spark version](https://kontext.tech/article/1062/spark-2x-to-3x-date-timestamp-and-int96-rebase-modes), and there still seems to be debate on whether arithmetic during conversion should wrap on overflow or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
