etseidl opened a new pull request, #7369: URL: https://github.com/apache/arrow-rs/pull/7369
# Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> Might fix #6476. # Rationale for this change `ArrowReaderMetadata::load_async` sometimes had to do multiple passes to fully load Parquet metadata when page indexes were requested. This is because `AsyncFileReader::get_metadata` function has no way of knowing if page indexes are desired. Recent API changes have allowed for passing this information to `AsyncFileReader`, so the extra page index logic in `load_async` should no longer be necessary. This version will still do multiple fetches since no prefetch hint is passed to the metadata reader. A follow on PR could add this hint to `ArrowReaderOptions`, but that would be a breaking change. # What changes are included in this PR? Convert `AsyncFileReader::get_metadata` to use the new `ParquetMetaDataReader::load_via_suffix_and_finish` API to reduce code duplication and add a `MetadataSuffixFetch` implementation to allow its use in `ArrowReaderMetadata::load_async`. # Are there any user-facing changes? No <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!--- If there are any breaking changes to public APIs, please call them out. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
