ryancasburn-KAI opened a new issue, #458: URL: https://github.com/apache/parquet-format/issues/458
### Describe the enhancement requested Hi, I'm new around here, please let me know if this request is better elsewhere. I'd like to propose an optional type parameter called `Offset` to TIMESTAMP logical types. In my common use case of Parquet files, the data is a running log with many rows, such that any one row group is unlikely to have more than a few days at a time. The idea of the `Offset` parameter would be to store for each row group (in Int64) an offset from Unix epoch, then the data would be stored relative to that offset. This provides a couple of benefits: 1. row groups could be selectively downsized (when possible) to INT32 physical types. This could save significant amounts of file size if I understand correctly. At millisecond level accuracy, INT32 could support row groups up to ~48 days long.[^1] 2. [The docs](https://github.com/apache/parquet-format/blob/4f208158dba80ff4bff4afaa4441d7270103dff6/LogicalTypes.md?plain=1#L458) identify that all TIMESTAMPs, but particularly those with NANOs accuracy have range limitations due to the INT64 limitation. Adding an `Offset` would allow practically unlimited ranges for TIMESTAMPs. [^1]: with an offset set in the middle of row group values, given the signed nature of INT32 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
