wjones127 opened a new issue #1348:
URL: https://github.com/apache/arrow-rs/issues/1348


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Enable receiving/sending a stream of Record Batches from/to another Arrow 
implementation. For example, 
https://github.com/datafusion-contrib/datafusion-python/pull/21 could benefit 
from a way to import a RecordBatchReader into Rust so it can be used by 
DataFusion.
   
   **Describe the solution you'd like**
   
   It might be worth implementing the [Arrow C Stream 
interface](https://arrow.apache.org/docs/format/CStreamInterface.html), which 
allows exporting a stream of record batches. This could enable PyArrow 
conversion between a PyArrow RecordBatchReader and some structure on the Rust 
side (an iterator of Record Batches?).
   
   **Describe alternatives you've considered**
   
   We can use FFI to bring over record batches already. In 
https://github.com/datafusion-contrib/datafusion-python/pull/21 , I 
experimented with just wrapping a Python iterator and moving each batch 
individually, but encountered some issues with deadlocks in the Python GIL.
   
   **Additional context**
   
   The Arrow C Stream interface was introduced in August 2020, in 
https://github.com/apache/arrow/pull/8052. It's been used so far to enable 
sending record batch streams to DuckDB from the R and Python implementation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to