jonded94 commented on code in PR #8790:
URL: https://github.com/apache/arrow-rs/pull/8790#discussion_r2505370704
##########
arrow-pyarrow/src/lib.rs:
##########
@@ -44,17 +44,20 @@
//! | `pyarrow.Array` | [ArrayData]
|
//! | `pyarrow.RecordBatch` | [RecordBatch]
|
//! | `pyarrow.RecordBatchReader` | [ArrowArrayStreamReader] / `Box<dyn
RecordBatchReader + Send>` (1) |
+//! | `pyarrow.Table` | [Table] (2)
|
//!
//! (1) `pyarrow.RecordBatchReader` can be imported as
[ArrowArrayStreamReader]. Either
//! [ArrowArrayStreamReader] or `Box<dyn RecordBatchReader + Send>` can be
exported
//! as `pyarrow.RecordBatchReader`. (`Box<dyn RecordBatchReader + Send>` is
typically
//! easier to create.)
//!
-//! PyArrow has the notion of chunked arrays and tables, but arrow-rs doesn't
-//! have these same concepts. A chunked table is instead represented with
-//! `Vec<RecordBatch>`. A `pyarrow.Table` can be imported to Rust by calling
-//!
[pyarrow.Table.to_reader()](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_reader)
-//! and then importing the reader as a [ArrowArrayStreamReader].
+//! (2) Although arrow-rs offers a
[pyarrow.Table](https://arrow.apache.org/docs/python/generated/pyarrow.Table)
+//! convenience wrapper [Table] (which internally holds `Vec<RecordBatch>`),
this is more meant for
+//! use cases where you already have `Vec<RecordBatch>` on the Rust side and
want to export that in
+//! bulk as a `pyarrow.Table`. In general, it is recommended to use streaming
approaches instead of
+//! dealing with bulk data.
+//! For example, a `pyarrow.Table` can be imported to Rust through
`PyArrowType<ArrowArrayStreamReader>`
+//! instead (since `pyarrow.Table` implements the ArrayStream PyCapsule
interface).
Review Comment:
> I think it would be good to note here that another advantage of using
ArrowArrayStreamReader is that it works with tables and stream input out of the
box.
I added that in the docs.
> Also reading through the docs again, I'd suggest making a reference to
Box<dyn RecordBatchReader> rather than ArrowArrayStreamReader. The former is a
higher level API and much easier to use.
I'm not exactly sure what you mean here. `Box<dyn RecordBatchReader>` only
implements *IntoPyArrow*, but not *FromPyArrow*. So in the example I state in
the new documentation, that for *consuming* a `pyarrow.Table` in Rust, also a
streaming approach could be used, the `Box<dyn RecordBatchReader>` isn't
helping sadly. One has to use `ArrowArrayStreamReader`, since that properly
implements `FromPyArrow`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]