Lubo Slivka created ARROW-15969:
-----------------------------------

             Summary: [Python] Add conversion from RecordBatchFileReader to 
RecordBatchReader
                 Key: ARROW-15969
                 URL: https://issues.apache.org/jira/browse/ARROW-15969
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Lubo Slivka


The suggested improvement is to introduce a conversion/adapter so that all 
batches from RecordBatchFileReader can be read one-by-one, once using 
RecordBatchReader.

Perhaps a new instance method RecordBatchFileReader.to_reader()? This would 
follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader 
which also has to_reader().

*Motivation*

Record Batches serialized into IPC file format can be read using 
RecordBatchFileReader. The interface of this reader is incompatible with 
RecordBatchReader.

This impacts for instance the Flight RPC DoGet, where it is not possible to 
efficiently (e.g. fully in C++) send out all data by using 
pyarrow.flight.RecordBatchStream. However, there may be other use cases where 
client code wants to read data batch-by-batch transparently, without caring 
about the serialization format.

Further background is here: 
[https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb]

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to