jorgecarleitao edited a comment on issue #1360:
URL: 
https://github.com/apache/arrow-datafusion/issues/1360#issuecomment-979478467


   Point taken wrt to the int96 deprecation.
   
   The datetime "9999-12-31" is `253402214400` seconds in unix timestamp:
   
   ```bash
   $ python -c "import datetime; 
print(datetime.datetime(year=9999,month=12,day=31).timestamp())"
   253402214400.0
   ```
   
   in nanoseconds, this corresponds to `253402214400 * 10^9 = 
253_402_214_400_000_000_000`. The maximum `i64` in Rust [equals 
to](https://doc.rust-lang.org/std/i64/constant.MAX.html) 
`9_223_372_036_854_775_807i64`. Comparing the two, we have:
   
   ```
   253_402_214_400_000_000_000 >
     9_223_372_036_854_775_807
   ```
   
   This was the rational I used to conclude that we can't fit "9999-12-31" in 
an i64 nanosecond since epoch. Since Java's Long is also i64 with the same 
maximum as Rust, I concluded that Spark must be discarding _something_ to fit 
such a date in a Long, since there is just not sufficient precision to 
represent that date in i64 ns. So, I looked for what they did.
   
   `int96` is represented as `[i64 nanos, i32 days]`. When reading such bytes 
from parquet, the interface that Spark uses must be something that consumes 
such types, and `fromJulianDay(days: Int, nanos: Long)` is the only one that 
does such a thing. As I mentioned, that code truncates the nanoseconds, which 
is consistent with being able to read that date _in microseconds_ (the two 
numbers above do not differ by more than 1000).
   
   I may be wrong.
   
   The parquet code in Rust is 
[here](https://github.com/apache/arrow-rs/blob/master/parquet/src/data_type.rs#L65).
 Note that it only goes to millis. The conversion to ns is done 
[here](https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/converter.rs#L179).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to