[
https://issues.apache.org/jira/browse/ARROW-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rok Mihevc updated ARROW-228:
-----------------------------
External issue URL: https://github.com/apache/arrow/issues/15573
> [Python] Create an Arrow-cpp-compatible interface for reading bytes from
> Python file-like objects
> --------------------------------------------------------------------------------------------------
>
> Key: ARROW-228
> URL: https://issues.apache.org/jira/browse/ARROW-228
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Reporter: Wes McKinney
> Assignee: Wes McKinney
> Priority: Major
> Fix For: 0.2.0
>
>
> In practice, IO interfaces in PyArrow will need to be bidirectional
> - Exposing internal IO interfaces written purely in C++ to Python users as
> file-like objects
> - Exposing Python file-like objects to the C++ IO subsystem
> To do this efficiently, we may want to introduce an arrow::Buffer subclass
> that manages the lifetime of a PyBytes object in a GIL-safe way (i.e., on
> destruction, the GIL is acquired and the object's refcount is decremented).
> We can still implement a Read method that copies bytes into some other
> buffer, after which the PyBytes is immediately destroyed.
> Outside of these byte buffer management issues, wrapping a file-like object
> (having read() -> bytes, seek(), tell(), and other basic file methods) is
> fairly straightforward, and will allow any of the current or upcoming IO
> adapters to read either from native classes (file system, HDFS, etc.) or
> arbitrary Python streams.
> To give a concrete example: consider the output of a GET http request -- this
> can be put in a {{io.BytesIO}} object and then treated as a first class
> citizen alongside the native (C++) IO classes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)