gatesn commented on issue #7983: URL: https://github.com/apache/arrow-rs/issues/7983#issuecomment-3165065017
I'm not sure I understand why this model isn't possible with the pull-based reader? I could implement an [AsyncFileReader](https://docs.rs/parquet/latest/parquet/arrow/async_reader/trait.AsyncFileReader.html) that enqueues Io requests to a channel along with a oneshot callback, and returns the oneshot as the future. Now I have a stream of IO requests (that can be handled however we like), and a stream of record batches that no longer depends on Tokio and can be driven from a futures block_on to expose a sync API. This is similar to many other Rust APIs e.g. Postgres, where you're given a background connection (IO stream) to spawn: https://docs.rs/tokio-postgres/latest/tokio_postgres/ One possible solution to prefetching that we thought of is to have a custom implementation of `StreamExt::buffered` that doesn't take a constant value, but instead takes a handle to the IO dispatcher and continues to pull futures and poll them into the IO prefetching queue reaches a sufficient size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
