paleolimbot commented on code in PR #39985:
URL: https://github.com/apache/arrow/pull/39985#discussion_r1488508302
##########
python/pyarrow/table.pxi:
##########
@@ -1327,6 +1327,68 @@ cdef class ChunkedArray(_PandasConvertible):
result += self.chunk(i).to_pylist()
return result
+ def __arrow_c_stream__(self, requested_schema=None):
+ """
+ Export to a C ArrowArrayStream PyCapsule.
+
+ Parameters
+ ----------
+ requested_schema : PyCapsule, default None
+ The schema to which the stream should be casted, passed as a
+ PyCapsule containing a C ArrowSchema representation of the
+ requested schema.
+
+ Returns
+ -------
+ PyCapsule
+ A capsule containing a C ArrowArrayStream struct.
+ """
+ cdef:
+ ArrowArrayStream* c_stream = NULL
+ ChunkedArray chunked = self
+
+ if requested_schema is not None:
+ out_type = DataType._import_from_c_capsule(requested_schema)
+ if self.type != out_type:
+ chunked = self.cast(out_type)
Review Comment:
Nowhere! A roundtrip *without* casting is almost certainly faster than a
roundtrip *with* casting, and it is already twice as slow as the cast +
roundtrip. I'm sure there's room to make a better benchmark (I'm also running a
C++ debug build), but I'm personally convinced that the cast + export solution
is not so bad that it should not be attempted.
> Well, I gave a possible solution above.
For RecordBatches? I don't think we have a way to do that for a stream of
`Array` in Arrow C++ or in pyarrow?
I'm happy to remove the feature as well and leave to be implemented properly
later...I didn't anticipate it being controversial.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]