tustvold opened a new pull request #1154: URL: https://github.com/apache/arrow-rs/pull/1154
**Proof of concept, tests are currently extremely limited** # Which issue does this PR close? Closes #111 . # Rationale for this change See ticket, in particular I wanted to confirm that it is possible to create an async parquet reader without any major changes to the parquet crate. This seems to come up as a frequent ask from the community, and I think we could support it without any major churn. # What changes are included in this PR? Adds a layer of indirection to `array_reader` to abstract it away from files, _I think this change may stand on its own merits_. It then adds a ParquetRecordBatchStream which is a `Stream` that yields `RecordBatch`. Under the hood, this uses async to read row groups into memory and then feeds these into the non-async decoders. The [parquet docs](https://parquet.apache.org/documentation/latest/) describe the column chunk as the unit of IO, and so I think buffering compressed row groups in memory is not an impractical approach. It also avoids having to maintain sync and async version of all the decoders, readers, etc... # Are there any user-facing changes? The only changes are to `array_reader` which since #1133 no longer has stability guarantees -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org