kylebarron opened a new issue, #6586:
URL: https://github.com/apache/arrow-rs/issues/6586

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   It is not currently possible to use arrow-rs's FFI to exchange something 
like an `ArrayStream` or `ChunkedArray` when those arrays do not represent 
RecordBatches. 
[`ffi_stream::ArrowArrayStreamReader`](https://docs.rs/arrow/latest/arrow/ffi_stream/struct.ArrowArrayStreamReader.html)
 will error if the data type of the stream is not `Struct`.
   
   This makes it impossible in the general case to interop with a 
`pyarrow.ChunkedArray` or `polars.Series` (via Python).
   
   The Arrow C Stream Interface _does_ support non-struct array types. 
`get_next()` of `ArrowArrayStream` returns an `ArrowArray`, and an `ArrowArray` 
can be any generic Arrow array. That Arrow array is _often_ a StructArray, with 
the understanding that the StructArray represents a RecordBatch, but it doesn't 
have to be.
   
   Here:
   
https://github.com/apache/arrow-rs/blob/5508978a3c5c4eb65ef6410e097887a8adaba38a/arrow-array/src/ffi_stream.rs#L364-L367
   you _assume_ that the data type of the stream is struct (and also assume 
that you can interpret the C Schema as a `Schema`), but that isn't required by 
the spec. To be more generic, you can [use the data type of the C Schema 
directly](https://github.com/kylebarron/arro3/blob/0829e34fe250314c2e068ff86e3c5e7ad003d607/pyo3-arrow/src/ffi/from_python/ffi_stream.rs#L89-L91).
   
   **Describe the solution you'd like**
   
   Some way to transfer a stream of `Array` via FFI.
   
   **Describe alternatives you've considered**
   
   There's currently no way to exchange a stream of generic arrays with 
arrow-rs, as far as I can tell.
   
   **Additional context**
   
   For full disclosure, I've already implemented this in my own library, 
pyo3-arrow. I have an 
[`ArrayReader`](https://docs.rs/pyo3-arrow/latest/pyo3_arrow/ffi/trait.ArrayReader.html)
 trait to parallel `arrow::RecordBatchReader`, and [vendored a derived copy of 
`ffi_stream.rs`](https://github.com/kylebarron/arro3/blob/0829e34fe250314c2e068ff86e3c5e7ad003d607/pyo3-arrow/src/ffi/from_python/ffi_stream.rs)
 to make it possible to handle this interop (while not necessarily 
materializing the entire stream as a `ChunkedArray`.
   
   I'm currently fine with my vendored copy of FFI, but others may have the 
same issue.
   
   Previous discussion in 
https://github.com/apache/arrow-rs/issues/5295#issuecomment-2402556354


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to