Re: [PR] Implement a `Vec` wrapper for `pyarrow.Table` convenience [arrow-rs]

via GitHub Fri, 07 Nov 2025 12:28:39 -0800


jonded94 commented on code in PR #8790:
URL: https://github.com/apache/arrow-rs/pull/8790#discussion_r2505370704



##########
arrow-pyarrow/src/lib.rs:
##########
@@ -44,17 +44,20 @@
 //! | `pyarrow.Array`             | [ArrayData]                                
                        |
 //! | `pyarrow.RecordBatch`       | [RecordBatch]                              
                        |
 //! | `pyarrow.RecordBatchReader` | [ArrowArrayStreamReader] / `Box<dyn 
RecordBatchReader + Send>` (1) |
+//! | `pyarrow.Table`             | [Table] (2)                                
                        |
 //!
 //! (1) `pyarrow.RecordBatchReader` can be imported as 
[ArrowArrayStreamReader]. Either
 //! [ArrowArrayStreamReader] or `Box<dyn RecordBatchReader + Send>` can be 
exported
 //! as `pyarrow.RecordBatchReader`. (`Box<dyn RecordBatchReader + Send>` is 
typically
 //! easier to create.)
 //!
-//! PyArrow has the notion of chunked arrays and tables, but arrow-rs doesn't
-//! have these same concepts. A chunked table is instead represented with
-//! `Vec<RecordBatch>`. A `pyarrow.Table` can be imported to Rust by calling
-//! 
[pyarrow.Table.to_reader()](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_reader)
-//! and then importing the reader as a [ArrowArrayStreamReader].
+//! (2) Although arrow-rs offers a 
[pyarrow.Table](https://arrow.apache.org/docs/python/generated/pyarrow.Table)
+//! convenience wrapper [Table] (which internally holds `Vec<RecordBatch>`), 
this is more meant for
+//! use cases where you already have `Vec<RecordBatch>` on the Rust side and 
want to export that in
+//! bulk as a `pyarrow.Table`. In general, it is recommended to use streaming 
approaches instead of
+//! dealing with bulk data.
+//! For example, a `pyarrow.Table` can be imported to Rust through 
`PyArrowType<ArrowArrayStreamReader>`
+//! instead (since `pyarrow.Table` implements the ArrayStream PyCapsule 
interface).

Review Comment:
   > I think it would be good to note here that another advantage of using 
ArrowArrayStreamReader is that it works with tables and stream input out of the 
box.
   
   I added that in the docs.
   
   > Also reading through the docs again, I'd suggest making a reference to 
Box<dyn RecordBatchReader> rather than ArrowArrayStreamReader. The former is a 
higher level API and much easier to use.
   
   I'm not exactly sure what you mean here. `Box<dyn RecordBatchReader>` only 
implements *IntoPyArrow*, but not *FromPyArrow*. So in the example I state in 
the new documentation, that for *consuming* a `pyarrow.Table` in Rust, also a 
streaming approach could be used, the `Box<dyn RecordBatchReader>` isn't 
helping sadly. One has to use `ArrowArrayStreamReader`, since that properly 
implements `FromPyArrow`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Implement a `Vec` wrapper for `pyarrow.Table` convenience [arrow-rs]

Reply via email to