pitrou commented on PR #242:
URL: https://github.com/apache/parquet-format/pull/242#issuecomment-2115136973

   > FWIW I'd be very interested to see how far we can push the current data 
structures with approaches like 
[apache/arrow-rs#5775](https://github.com/apache/arrow-rs/issues/5775), before 
reaching for format changes.
   
   At first sight this would be a Rust-specific optimization. Also, while such 
improvements are good in themselves, they don't address the fundamental issue 
that file metadata size is currently O(n_row_groups * n_columns).
   
   > I'd also observe that the column statistics can already be stored 
separately from FileMetadata, and if you do so you're really only left with a 
couple of integers...
   
   The main change in this PR is that a `RowGroupV3` structure is O(1), instead 
of O(n_columns) for a `RowGroup`. The rest are assorted improvements.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to