alkis opened a new pull request, #43793: URL: https://github.com/apache/arrow/pull/43793
This is an annotated attempt to use [flatbuffers](https://flatbuffers.dev/) as metadata for parquet. The goals are: 1. flatbuffers "parse" extremely fast compared to thrift which - cuts down on critical path latency of processing parquet files - is so fast, O(n) effects of "parsing" metadata vs scanning 1 column are eliminated 2. flatbuffers are typically bulkier than thrift, in this PR there are a multitude of optimizations to shrink the size of flatbuffer metadata 3. keep the flatbuffer object model similar to that of thrift to facilitate easier migration to new metadata format To run experiments: ```sh mkdir arrow/src/o cd arrow/src/o cmake .. --preset ninja-benchmarks ninja -Co && o/relwithdebinfo/parquet-metadata3-benchmark path-to-footers/* ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
