[I] When `datafusion.execution.parquet.coerce_int96` is set, timestamp type is still reported as Timestamp(nanoseconds) [datafusion]

via GitHub Tue, 15 Apr 2025 11:24:09 -0700


alamb opened a new issue, #15721:
URL: https://github.com/apache/datafusion/issues/15721


   ### Describe the bug
   
   `datafusion.execution.parquet.coerce_int96` is supposed to 
   
   >  If true, parquet reader will read columns of physical type int96 as 
originating from a different resolution than nanosecond. This is useful for 
reading data from systems like Spark which stores microsecond resolution 
timestamps in an int96 allowing it to write values with a larger date range 
than 64-bit timestamps with nanosecond resolution.
   
   However, when I set this to `ms` the type is still reported to be 
`Timestamp(Nanoseconds)` 
   
   ### To Reproduce
   
   ```sql
   -- Enable coercion of int96 to microseconds
   set datafusion.execution.parquet.coerce_int96 = ms;
   
   -- Create external table
   CREATE EXTERNAL TABLE int96_from_spark
   STORED AS PARQUET
   LOCATION 'parquet-testing/data/int96_from_spark.parquet';
   
   -- Print schema
   describe int96_from_spark;
   ```
   
   Results in 
   
   ```sql
   +-------------+-----------------------------+-------------+
   | column_name | data_type                   | is_nullable |
   +-------------+-----------------------------+-------------+
   | a           | Timestamp(Nanosecond, None) | YES         |
   +-------------+-----------------------------+-------------+
   1 row(s) fetched.
   Elapsed 0.001 seconds.
   ```
   
   ### Expected behavior
   
   I expect the output type to be `Timestamp(Microsecond, None)`
   
   ### Additional context
   
   - The new feature was added in 
https://github.com/apache/datafusion/pull/15537
   - Possibly related to  https://github.com/apache/arrow-rs/issues/7287
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[I] When `datafusion.execution.parquet.coerce_int96` is set, timestamp type is still reported as Timestamp(nanoseconds) [datafusion]

Reply via email to