sachouche commented on issue #1630: DRILL-7018: Fixed Parquet buffer overflow 
when reading timestamp column
URL: https://github.com/apache/drill/pull/1630#issuecomment-459431892
 
 
   Thanks Vitali for the review @vdiravka!
   
   The loss is due to the way that Parquet INT96 encodes the timestamp 
information:
   - 4 bytes used for the julian day and then the rest for nanosecond precision
   - Formula to convert to unix epoch: (julian_day - 2440588) * (86400 * 1000 * 
1000 * 1000) + nanoseconds`
   - I believe Parquet borrowed this format from Hive / Impala; you can refer 
to this [PR](https://github.com/apache/parquet-format/pull/49)
   
   **NOTE -** My understanding is that INT64 is the preferred format as Spark 
(few years back) dropped INT96 support in favor to INT64.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to