[I] Add FlatBuffers footer to speed up metadata parsing [parquet-format]

via GitHub Mon, 20 Oct 2025 05:32:18 -0700


alamb opened a new issue, #530:
URL: https://github.com/apache/parquet-format/issues/530

### Describe the enhancement requested

@alkis is spearheading the improvement of Parquet footer metadata, with a
proposal for adding an optional FlatBuffers based footer to the Parquet format.
Andrew Bell noted on the mailing list there was no issue to track this, so
filing one here

Quoting from the [Proposal
Document](https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.0)

> Tabular data stored in Parquet files with thousands of leaf columns are
becoming increasingly common, fueled by AI datasets. Decoding Parquet files
always involves a two step process. First the Parquet metadata (aka footer) is
fetched and resolved and then the row groups/column chunks are fetched and
decoded. The two steps are serial because of the natural data dependency
between them. Thus improving the time it takes to resolve Parquet metadata will
allow engines to be more efficient to process Parquet files.

Related mailing list discussion:
https://lists.apache.org/thread/j9qv5vyg0r4jk6tbm6sqthltly4oztd3

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Add FlatBuffers footer to speed up metadata parsing [parquet-format]

Reply via email to