jorisvandenbossche opened a new issue, #38010: URL: https://github.com/apache/arrow/issues/38010
https://github.com/apache/arrow/pull/37797 is adding official dunder methods to expose the Arrow C Data/Stream Interface in Python using PyCapsules (https://github.com/apache/arrow/issues/34031 / https://github.com/apache/arrow/issues/35531). In addition to official dunders to expose this to other libraries, we also need public APIs in pyarrow to import / consume such PyCapsules (or rather the objects implementing the dunders to give you the PyCapsule). https://github.com/apache/arrow/pull/37797 already added this to the `pa.array(..)`, `pa.record_batch(..)` and `pa.schema(..)` constructors, such that you can for example create a pyarrow array with `pa.array(obj)` given any object `obj` that supports the interface by defining `__arrow_c_array__`. But that's not fully complete: we certainly need a way to construct a `RecordBatchReader` as well, where we don't have such a factory function available. For this, we could add a `from_` function (similar to the existing `from_batches`) like `RecordBatchReader.from_stream`? (in addition there is also the Table, Field and DataType constructors, both those all have factory functions that could support this, similar to `pa.array(..)` et al) --- Secondly, I am also wondering if we want to provide APIs that accept PyCapsules directly, instead of an object that implements the dunders. For example, if you are a library that has data in Arrow compatible memory, and you want to convert this to pyarrow through the C Data Interface, you might want to use a PyCapsule directly if your library doesn't expose a Python class that represents that data (to avoid that you need to create a small wrapper class just with the dunder to pass to the pyarrow constructor, although this is of course not difficult). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
