sachouche commented on issue #1630: DRILL-7018: Fixed Parquet buffer overflow when reading timestamp column URL: https://github.com/apache/drill/pull/1630#issuecomment-459431892 Thanks Vitali for the review @vdiravka! The loss is due to the way that Parquet INT96 encodes the timestamp information: - 4 bytes used for the julian day and then the rest for nanosecond precision - Formula to convert to unix epoch: (julian_day - 2440588) * (86400 * 1000 * 1000 * 1000) + nanoseconds` - I believe Parquet borrowed this format from Hive / Impala; you can refer to this [PR](https://github.com/apache/parquet-format/pull/49) **NOTE -** My understanding is that INT64 is the preferred format as Spark (few years back) dropped INT96 support in favor to INT64.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services