[
https://issues.apache.org/jira/browse/ARROW-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Li updated ARROW-15969:
-----------------------------
Summary: [C++][Python] Add conversion from RecordBatchFileReader to
RecordBatchReader (was: [Python] Add conversion from RecordBatchFileReader to
RecordBatchReader)
> [C++][Python] Add conversion from RecordBatchFileReader to RecordBatchReader
> ----------------------------------------------------------------------------
>
> Key: ARROW-15969
> URL: https://issues.apache.org/jira/browse/ARROW-15969
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Lubo Slivka
> Priority: Major
>
> The suggested improvement is to introduce a conversion/adapter so that all
> batches from RecordBatchFileReader can be read one-by-one using
> RecordBatchReader.
> Perhaps a new instance method RecordBatchFileReader.to_reader()? This would
> follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader
> which also has to_reader().
> *Motivation*
> Record Batches serialized into IPC file format can be read using
> RecordBatchFileReader. The interface of this reader is incompatible with
> RecordBatchReader.
> This impacts for instance the Flight RPC DoGet, where it is not possible to
> efficiently (e.g. fully in C++) send out all data by using
> pyarrow.flight.RecordBatchStream. However, there may be other use cases where
> client code wants to read data batch-by-batch transparently, without caring
> about the serialization format.
> Further background is here:
> [https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb]
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)