pitrou commented on code in PR #39985:
URL: https://github.com/apache/arrow/pull/39985#discussion_r1488542240


##########
python/pyarrow/table.pxi:
##########
@@ -1327,6 +1327,68 @@ cdef class ChunkedArray(_PandasConvertible):
             result += self.chunk(i).to_pylist()
         return result
 
+    def __arrow_c_stream__(self, requested_schema=None):
+        """
+        Export to a C ArrowArrayStream PyCapsule.
+
+        Parameters
+        ----------
+        requested_schema : PyCapsule, default None
+            The schema to which the stream should be casted, passed as a
+            PyCapsule containing a C ArrowSchema representation of the
+            requested schema.
+
+        Returns
+        -------
+        PyCapsule
+            A capsule containing a C ArrowArrayStream struct.
+        """
+        cdef:
+            ArrowArrayStream* c_stream = NULL
+            ChunkedArray chunked = self
+
+        if requested_schema is not None:
+            out_type = DataType._import_from_c_capsule(requested_schema)
+            if self.type != out_type:
+                chunked = self.cast(out_type)

Review Comment:
   > I'm happy to remove the feature as well and leave to be implemented 
properly later...I didn't anticipate it being controversial.
   
   It's not controversial. The implementation is. I'm sure for simple 
benchmarks with a small dataset, an otherwise idle machine, and enough RAM to 
hold multiple copies of the dataset, casting everything at once can seem 
slightly faster because it saves some overhead. That doesn't make it a viable 
strategy in the general case.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to