suremarc opened a new issue, #4090: URL: https://github.com/apache/arrow-rs/issues/4090
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Currently the [`ParquetMetaData`](https://docs.rs/parquet/latest/parquet/file/metadata/struct.ParquetMetaData.html) object has optional fields for the column & offset indexes which are unpopulated at first. When the `ArrowReaderBuilder` is created using `ArrowReaderOptions::with_page_index(true)` it loads the page index at query time. However, this is potentially suboptimal as it incurs additional latency making an extra request (typically to object storage which is high-latency) for each query. **Describe the solution you'd like** A new method for the `ParquetObjectReader` that toggles loading the page index at construction time, something like this: ```rust impl ParquetObjectReader { pub fn preload_page_index(self, should_preload: bool) -> Self { self.preload_page_index = true } } ``` which would trigger conditional logic in the `get_metadata` function to return metadata with the page index already loaded. **Describe alternatives you've considered** A public async API for deserializing the column & offset index, similar to [`index_reader`](https://docs.rs/parquet/latest/parquet/file/page_index/index_reader/index.html) but with async support and integrated with `AsyncFileReader` to enable coalescing of multiple fetches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
