jorgecarleitao edited a comment on issue #1360: URL: https://github.com/apache/arrow-datafusion/issues/1360#issuecomment-979478467
Point taken wrt to the int96 deprecation. The datetime "9999-12-31" is `253402214400` seconds in unix timestamp: ```bash $ python -c "import datetime; print(datetime.datetime(year=9999,month=12,day=31).timestamp())" 253402214400.0 ``` in nanoseconds, this corresponds to `253402214400 * 10^9 = 253_402_214_400_000_000_000`. The maximum `i64` in Rust [equals to](https://doc.rust-lang.org/std/i64/constant.MAX.html) `9_223_372_036_854_775_807`. Comparing the two, we have: ``` 253_402_214_400_000_000_000 > 9_223_372_036_854_775_807 ``` This was the rational I used to conclude that we can't fit "9999-12-31" in an i64 nanosecond since epoch. Since Java's Long is also i64 with the same maximum as Rust, I concluded that Spark must be discarding _something_ to fit such a date in a Long, since there is just not sufficient precision to represent that date in i64 ns. So, I looked for what they did. `int96` represents `[i64 nanos, i32 days]`. When reading such bytes from parquet, the interface that Spark uses must be something that consumes such types, and `fromJulianDay(days: Int, nanos: Long)` is the only one that does such a thing. As I mentioned, that code truncates the nanoseconds, which is consistent with being able to read that date _in microseconds_ (the two numbers above do not differ by more than 1000). I may be wrong. The parquet code in Rust is [here](https://github.com/apache/arrow-rs/blob/master/parquet/src/data_type.rs#L65). Note that it only goes to millis. The conversion to ns is done [here](https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/converter.rs#L179). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
