etseidl opened a new pull request, #6392:
URL: https://github.com/apache/arrow-rs/pull/6392

   # Which issue does this PR close?
   
   Relates to #6002
   
   # Rationale for this change
    
   This is an attempt to consolidate Parquet footer/page index reading/parsing 
into a single place. 
   
   # What changes are included in this PR?
   
   The new `ParquetMetaDataReader` basically takes the code in 
`parquet/src/file/footer.rs` and `parquet/src/arrow/async_reader/metadata.rs` 
and mashes them together into a single API. Using this, the 
`read_metadata_from_file` call from #6081 would become:
   ```rust
   fn read_metadata_from_file(file: impl AsRef<Path>) -> ParquetMetaData {
       let reader = ParquetMetaDataReader::new()
           .with_page_indexes(true);
       let mut file = std::fs::File::open(file).unwrap();
       reader.try_parse(file).unwrap();
       // return ParquetMetaData with page indexes populated
       reader.finish().unwrap()
   }
   ```
   Also included are two async functions `try_load()` and 
`try_load_from_tail()`. The former is a combination of `MetadataLoader::load()` 
and `MetadataLoader::load_page_index`. The latter is an attempt at addressing 
the issue of loading the footer when the file size is not known, so it requires 
being able to seek from the end of the file.
   
   This implementation is very rough, with not enough safety checking and 
documentation. At this point I'm hoping for feedback on the approach. If this 
seems at all useful, then a path forward would be to first add 
`ParquetMetaDataReader` alone, and then in subsequent PRs begin to use it as a 
replacement for other functions which could then be deprecated. The idea is to 
get as much in without breaking changes, and then introduce the breaking 
changes once 54.0.0 is open.
   
   # Are there any user-facing changes?
   
   Eventually, yes.
   
   <!--
   If there are user-facing changes then we may require documentation to be 
updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking 
change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to