pitrou commented on PR #242: URL: https://github.com/apache/parquet-format/pull/242#issuecomment-2115136973
> FWIW I'd be very interested to see how far we can push the current data structures with approaches like [apache/arrow-rs#5775](https://github.com/apache/arrow-rs/issues/5775), before reaching for format changes. At first sight this would be a Rust-specific optimization. Also, while such improvements are good in themselves, they don't address the fundamental issue that file metadata size is currently O(n_row_groups * n_columns). > I'd also observe that the column statistics can already be stored separately from FileMetadata, and if you do so you're really only left with a couple of integers... The main change in this PR is that a `RowGroupV3` structure is O(1), instead of O(n_columns) for a `RowGroup`. The rest are assorted improvements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
