mbutrovich opened a new pull request, #73:
URL: https://github.com/apache/parquet-testing/pull/73

   We are adding Spark-compatible int96 support to [DataFusion 
Comet](https://github.com/apache/datafusion-comet) when using arrow-rs's 
Parquet reader. To achieve this, we first added support for [arrow-rs to read 
int96 at different resolutions than 
nanosecond](https://github.com/apache/arrow-rs/pull/7285). It would previously 
generate nulls for non-null values. Next, we will add support to DataFusion to 
generate the necessary schema for arrow-rs to read int96 at the resolution that 
Spark expects. Finally, we will connect everything together in DataFusion Comet 
for accelerated Parquet reading with int96 values. We would like to test 
compatibility in all of these projects, and DataFusion and arrow-rs rely on 
this repo for Parquet files to test against.
   
   Please see the included markdown file for the details of the file. Please 
let me know if you think it would be helpful to mention that this type is now 
deprecated, and we are merely offering it for systems that want to maintain 
compatibility with Spark (which still defaults to writing this type for 
timestamps).
   
   **Additional context (taken from 
https://github.com/apache/arrow-rs/issues/7220)**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   
   - Please see https://github.com/apache/datafusion/issues/7958 for relevant 
discussion from 2023.
   - Interpreting INT96 as a timestamp can be tough: it depends on the [Spark 
config](https://spark.apache.org/docs/latest/configuration.html), the [Spark 
version](https://kontext.tech/article/1062/spark-2x-to-3x-date-timestamp-and-int96-rebase-modes),
 and there still seems to be debate on whether arithmetic during conversion 
should wrap on overflow or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to