wgtmac commented on issue #458:
URL: https://github.com/apache/parquet-format/issues/458#issuecomment-2434767043

   Thanks for opening the issue! I think the current file size is not an issue 
as we have delta encoding. The problems of adding offset to row group metadata 
I can see so far are:
   - If we have multiple timestamp columns, we have to add one field for each. 
Perhaps a map<string, long> for the mapping from column_id to offset.
   - This complicates the writer and reader process as we need to do extra 
arithmetics to deal with the offset.
   - Usually the cutoff of row group is transparent to the users, which makes 
it harder to set the offset to the row group metadata.
   - We need to guarantee backward compatibility. Old readers without the 
knowledge of offset will result in wrong values. Adding a new timestamp type 
for this looks like an overkill to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to