suremarc opened a new issue, #4090:
URL: https://github.com/apache/arrow-rs/issues/4090

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Currently the 
[`ParquetMetaData`](https://docs.rs/parquet/latest/parquet/file/metadata/struct.ParquetMetaData.html)
 object has optional fields for the column & offset indexes which are 
unpopulated at first. When the `ArrowReaderBuilder` is created using 
`ArrowReaderOptions::with_page_index(true)` it loads the page index at query 
time. However, this is potentially suboptimal as it incurs additional latency 
making an extra request (typically to object storage which is high-latency) for 
each query. 
   
   **Describe the solution you'd like**
   A new method for the `ParquetObjectReader` that toggles loading the page 
index at construction time, something like this:
   ```rust
   impl ParquetObjectReader {
       pub fn preload_page_index(self, should_preload: bool) -> Self {
           self.preload_page_index = true
       }
   }
   ```
   
   which would trigger conditional logic in the `get_metadata` function to 
return metadata with the page index already loaded. 
   
   **Describe alternatives you've considered**
   A public async API for deserializing the column & offset index, similar to 
[`index_reader`](https://docs.rs/parquet/latest/parquet/file/page_index/index_reader/index.html)
 but with async support and integrated with `AsyncFileReader` to enable 
coalescing of multiple fetches. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to