alkis opened a new pull request, #43793:
URL: https://github.com/apache/arrow/pull/43793

   This is an annotated attempt to use [flatbuffers](https://flatbuffers.dev/) 
as metadata for parquet. The goals are:
   1. flatbuffers "parse" extremely fast compared to thrift which
       - cuts down on critical path latency of processing parquet files
       - is so fast, O(n) effects of "parsing" metadata vs scanning 1 column 
are eliminated
   2. flatbuffers are typically bulkier than thrift, in this PR there are a 
multitude of optimizations to shrink the size of flatbuffer metadata
   3. keep the flatbuffer object model similar to that of thrift to facilitate 
easier migration to new metadata format
   
   To run experiments:
   ```sh
   mkdir arrow/src/o
   cd arrow/src/o
   cmake .. --preset ninja-benchmarks
   ninja -Co && o/relwithdebinfo/parquet-metadata3-benchmark path-to-footers/*
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to