lidavidm opened a new pull request #9620: URL: https://github.com/apache/arrow/pull/9620
This provides an async Parquet reader where the unit of concurrency is a single row group. There are some caveats still: - [ ] This implementation is unsafe if pre_buffer=True in ArrowReaderProperties. Instead, the user needs to manually call file_reader()->parquet_reader()->PreBuffer(). I expect the kind of application using the async reader would also want to control this anyways, so I'd lean towards just failing the call if the user has pre_buffer=True, but the other commit in this PR provides a version that is safe with pre_buffer=True at the cost of some code duplication. - [ ] There are some TODOs scattered around for after #9607 is merged. - [ ] Docstrings need writing. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org