kylebarron commented on issue #9423:
URL: https://github.com/apache/arrow-rs/issues/9423#issuecomment-4269176426

   > in `arro3`'s case, I think this is only because it actually eagerly 
collects an entire file (?) from object store into RAM and converts it into a 
`Table` before serving rows from it: 
[kylebarron/arro3@`4cf69f4`/arro3-io/src/parquet.rs#L77](https://github.com/kylebarron/arro3/blob/4cf69f475bba07a6eec098b8351057ea15be0c62/arro3-io/src/parquet.rs#L77)
   
   Yes, I _think_ that's accurate. I spent the most time on the core Arrow 
classes in `arro3-core` and never spent _that_ much time on trying to make 
Parquet loading efficient. And that was before I learned (in 
https://github.com/developmentseed/obstore) how to expose a Rust async stream 
to Python as an async iterator. We could refactor the `arro3-io` reader to 
expose an async stream of Arrow record batches without too much difficulty.
   
   I didĀ also prototype an async Parquet API similar to pyarrow's 
https://github.com/kylebarron/arro3/pull/313, but I never stabilized it enough 
to merge it.
   
   If either of these interest you, happy to discuss more on arro3 issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to