[GitHub] [arrow-datafusion] andrei-ionescu commented on issue #1360: Reading Parquet file with timestamp column with `9999` year results in overflow panic

GitBox Fri, 26 Nov 2021 04:31:28 -0800


andrei-ionescu commented on issue #1360:
URL: 
https://github.com/apache/arrow-datafusion/issues/1360#issuecomment-979944891



   Thank you @jorgecarleitao! 
   
   I do understand that the nanos, that is expressed as `INT96` in parquet and 
supported by other frameworks, do not fit into the `i64` rust type hence the 
`overflow` panic. 
   
   From my calculations, the current implementation over `i64` can hold up to 
`2262-04-11T23:47:16.854Z` with nanosecond precision. And since `9999-12-31` is 
7000 yers over that max date time it fails with overflow panic. 
   
   Now, what can we do?
   
   1. Improve datafusion to support higher precision switching to `i128` Rust 
type (there is no `i96` in Rust) to totally remove the panic, but this may 
require extensive effort to have it implemented  (I'm no expert on `datafusion` 
nor `arrow-rs/parquet` but I want to learn - my background is distributed 
processing with Spark and Scala as programming language).
   2. Keep the `i64` limitation and:
       - clearly specify in the docs that data fusion supports nanosecond 
precision up to `2262-04-11` and reading parquet files with timestamp fields 
greater value will fail
       - improve the panic error message to better understand that is the data 
that is over the limits and and which value is the culprit (this is very useful 
when you have a large dataset and you need to look for the needle in the 
haystack to fix the issue).
   3. Do what Spark and other framework do.
   
   I hope I ain't a nuisance for you. 😀 
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] andrei-ionescu commented on issue #1360: Reading Parquet file with timestamp column with `9999` year results in overflow panic

Reply via email to