andrei-ionescu commented on issue #1360:
URL:
https://github.com/apache/arrow-datafusion/issues/1360#issuecomment-979944891
Thank you @jorgecarleitao!
I do understand that the nanos, that is expressed as `INT96` in parquet and
supported by other frameworks, do not fit into the `i64` rust type hence the
`overflow` panic.
From my calculations, the current implementation over `i64` can hold up to
`2262-04-11T23:47:16.854Z` with nanosecond precision. And since `9999-12-31` is
7000 yers over that max date time it fails with overflow panic.
Now, what can we do?
1. Improve datafusion to support higher precision switching to `i128` Rust
type (there is no `i96` in Rust) to totally remove the panic, but this may
require extensive effort to have it implemented (I'm no expert on `datafusion`
nor `arrow-rs/parquet` but I want to learn - my background is distributed
processing with Spark and Scala as programming language).
2. Keep the `i64` limitation and:
- clearly specify in the docs that data fusion supports nanosecond
precision up to `2262-04-11` and reading parquet files with timestamp fields
greater value will fail
- improve the panic error message to better understand that is the data
that is over the limits and and which value is the culprit (this is very useful
when you have a large dataset and you need to look for the needle in the
haystack to fix the issue).
3. Do what Spark and other framework do.
I hope I ain't a nuisance for you. 😀
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]