Lubo Slivka created ARROW-15969:
-----------------------------------
Summary: [Python] Add conversion from RecordBatchFileReader to
RecordBatchReader
Key: ARROW-15969
URL: https://issues.apache.org/jira/browse/ARROW-15969
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Lubo Slivka
The suggested improvement is to introduce a conversion/adapter so that all
batches from RecordBatchFileReader can be read one-by-one, once using
RecordBatchReader.
Perhaps a new instance method RecordBatchFileReader.to_reader()? This would
follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader
which also has to_reader().
*Motivation*
Record Batches serialized into IPC file format can be read using
RecordBatchFileReader. The interface of this reader is incompatible with
RecordBatchReader.
This impacts for instance the Flight RPC DoGet, where it is not possible to
efficiently (e.g. fully in C++) send out all data by using
pyarrow.flight.RecordBatchStream. However, there may be other use cases where
client code wants to read data batch-by-batch transparently, without caring
about the serialization format.
Further background is here:
[https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)