Wes McKinney created ARROW-228:
----------------------------------
Summary: [Python] Create an Arrow-cpp-compatible interface for
reading bytes from Python file-like objects
Key: ARROW-228
URL: https://issues.apache.org/jira/browse/ARROW-228
Project: Apache Arrow
Issue Type: New Feature
Components: Python
Reporter: Wes McKinney
Assignee: Wes McKinney
In practice, IO interfaces in PyArrow will need to be bidirectional
- Exposing internal IO interfaces written purely in C++ to Python users as
file-like objects
- Exposing Python file-like objects to the C++ IO subsystem
To do this efficiently, we may want to introduce an arrow::Buffer subclass that
manages the lifetime of a PyBytes object in a GIL-safe way (i.e., on
destruction, the GIL is acquired and the object's refcount is decremented). We
can still implement a Read method that copies bytes into some other buffer,
after which the PyBytes is immediately destroyed.
Outside of these byte buffer management issues, wrapping a file-like object
(having read() -> bytes, seek(), tell(), and other basic file methods) is
fairly straightforward, and will allow any of the current or upcoming IO
adapters to read either from native classes (file system, HDFS, etc.) or
arbitrary Python streams.
To give a concrete example: consider the output of a GET http request -- this
can be put in a {{io.BytesIO}} object and then treated as a first class citizen
alongside the native (C++) IO classes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)