mbutrovich commented on PR #15537: URL: https://github.com/apache/datafusion/pull/15537#issuecomment-2802943864
> I checked that the data seems to come out ok with datafusion 46. Can you remind me what the different would be with this option (that the timestamp type is different?) > > ``` > andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion/parquet-testing$ ~/Software/datafusion-cli/datafusion-cli-46.0.0 -c "select * from 'data/int96_from_spark.parquet'"; > DataFusion CLI v46.0.0 > +-------------------------------+ > | a | > +-------------------------------+ > | 2024-01-01T20:34:56.123456 | > | 2024-01-01T01:00:00 | > | 1816-03-29T08:56:08.066277376 | > | 2024-12-30T23:00:00 | > | NULL | > | 2147-08-27T00:35:19.850745856 | > +-------------------------------+ > 6 row(s) fetched. > Elapsed 0.007 seconds. > ``` Without coercion: ``` matt@Matthews-MacBook-Pro parquet-testing % ../target/debug/datafusion-cli -c "select * from 'data/int96_from_spark.parquet'"; DataFusion CLI v46.0.1 +-------------------------------+ | a | +-------------------------------+ | 2024-01-01T20:34:56.123456 | | 2024-01-01T01:00:00 | | 1816-03-29T08:56:08.066277376 | | 2024-12-30T23:00:00 | | NULL | | 1815-11-08T16:01:01.191053312 | +-------------------------------+ 6 row(s) fetched. Elapsed 0.006 seconds. ``` With coercion: ``` matt@Matthews-MacBook-Pro parquet-testing % ../target/debug/datafusion-cli -c "set datafusion.execution.parquet.coerce_int96 to 'us'; select * from 'data/int96_from_spark.parquet'"; DataFusion CLI v46.0.1 0 row(s) fetched. Elapsed 0.001 seconds. +----------------------------+ | a | +----------------------------+ | 2024-01-01T20:34:56.123456 | | 2024-01-01T01:00:00 | | NULL | | 2024-12-30T23:00:00 | | NULL | | NULL | +----------------------------+ 6 row(s) fetched. Elapsed 0.005 seconds. ``` Frustratingly, those two new nulls aren't really nulls. They're the challenging values that we want to be able to read back in Comet. However, [we can't print them with current chrono behavior](https://github.com/apache/arrow-rs/issues/7287) which is why I didn't test at the SQL layer. However, the real values are in there and we'll be able to do what we need to do in Comet at the SchemaAdapter level with this change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org