masonh22 opened a new issue, #6559: URL: https://github.com/apache/arrow-rs/issues/6559
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I've noticed low CPU utilization when reading from filesystems with low bandwidth using a `ParquetRecordBatchStream`. This appears to be caused by the fact that the stream fetches row group data on demand rather than ahead of time. In my specific scenario, I'm reading a parquet file from s3 with four 128MB row groups. It takes ~2 seconds to fetch the data and ~500ms to decode the entire row group. In all, it takes around 10 seconds to read and decode the entire file. **Describe the solution you'd like** I'd like to add the option for `ParquetRecordBatchStream` to fetch the data for the next row group while decoding data for the current row group. **Describe alternatives you've considered** **Additional context** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
