[I] Implement a "push style" API for decoding Parquet Metadata [arrow-rs]

via GitHub Mon, 18 Aug 2025 04:36:47 -0700


alamb opened a new issue, #8164:
URL: https://github.com/apache/arrow-rs/issues/8164


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   - part of https://github.com/apache/arrow-rs/issues/8000
   
   The current 
[`ParquetMetaDataReader`](https://docs.rs/parquet/latest/parquet/file/metadata/struct.ParquetMetaDataReader.html)
 is a wonder of software engineering thanks to @etseidl. However, it is 
somewhat complicated to use as it has both async and sync methods as well as 
keeps state internally in a non obvious way -- for example do you call 
`try_parse` or `parse_and_finish`? Or how os `load_via_suffix_and_finish` 
related?
   
   Compared to what came before it, ParquetMetaDataReader is an amazing 
improvement, but I think we could do better.
   
   I ran into this when I discovered that Metadata is needed when implementing 
a push decoder for Parquet:
   - https://github.com/apache/arrow-rs/issues/7983
   
   Basically, I want a way to parse the metadata without **ALSO** doing the IO 
at the same time
   
   **Describe the solution you'd like**
   If we want to truly separate IO and CPU we also need a way to decode the 
metadata without explicit IO, and hence this PR that provides a way to decode 
metadata "push style" where it tells you what bytes are needed. It follows the 
same API as the parquet push decoder
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features 
you've considered.
   -->
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Implement a "push style" API for decoding Parquet Metadata [arrow-rs]

Reply via email to