ryancasburn-KAI opened a new issue, #458:
URL: https://github.com/apache/parquet-format/issues/458

   ### Describe the enhancement requested
   
   Hi, I'm new around here, please let me know if this request is better 
elsewhere.
   
   I'd like to propose an optional type parameter called `Offset` to TIMESTAMP 
logical types. 
   
   In my common use case of Parquet files, the data is a running log with many 
rows, such that any one row group is unlikely to have more than a few days at a 
time.
   
   The idea of the `Offset` parameter would be to store for each row group (in 
Int64) an offset from Unix epoch, then the data would be stored relative to 
that offset. 
   
   This provides a couple of benefits:
   
   1. row groups could be selectively downsized (when possible) to INT32 
physical types. This could save significant amounts of file size if I 
understand correctly. At millisecond level accuracy, INT32 could support row 
groups up to ~48 days long.[^1]
   2. [The 
docs](https://github.com/apache/parquet-format/blob/4f208158dba80ff4bff4afaa4441d7270103dff6/LogicalTypes.md?plain=1#L458)
 identify that all TIMESTAMPs, but particularly those with NANOs accuracy have 
range limitations due to the INT64 limitation. Adding an `Offset` would allow 
practically unlimited ranges for TIMESTAMPs.
   
   [^1]: with an offset set in the middle of row group values, given the signed 
nature of INT32
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to