[ 
https://issues.apache.org/jira/browse/ARROW-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511883#comment-17511883
 ] 

Alessandro Molina commented on ARROW-15969:
-------------------------------------------

{quote}or it is necessary to add some kind of Reset() function to allow 
streaming the whole file again.
{quote}
Uhm, I'm not sure why that would be so strange, {{seek}} is a fairly common 
capability for file readers.

I do see how making a separate continuous reader for the file might be 
considered more "pure", but I can't refrain from wondering if we are making 
things harder to use and reason about just to enforce a principle of single 
responsibility. I mean "you have to create a reader out of the reader to 
actually be able to read it" doesn't same exactly practical from a user point 
of view.

> [C++][Python] Add conversion from RecordBatchFileReader to RecordBatchReader
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-15969
>                 URL: https://issues.apache.org/jira/browse/ARROW-15969
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Lubo Slivka
>            Priority: Major
>
> The suggested improvement is to introduce a conversion/adapter so that all 
> batches from RecordBatchFileReader can be read one-by-one using 
> RecordBatchReader.
> Perhaps a new instance method RecordBatchFileReader.to_reader()? This would 
> follow the suit of for instance the pyarrow.flight.MetadataRecordBatchReader 
> which also has to_reader().
> *Motivation*
> Record Batches serialized into IPC file format can be read using 
> RecordBatchFileReader. The interface of this reader is incompatible with 
> RecordBatchReader.
> This impacts for instance the Flight RPC DoGet, where it is not possible to 
> efficiently (e.g. fully in C++) send out all data by using 
> pyarrow.flight.RecordBatchStream. However, there may be other use cases where 
> client code wants to read data batch-by-batch transparently, without caring 
> about the serialization format.
> Further background is here: 
> [https://lists.apache.org/thread/b9jwk103fgxfo4kct12t00ymdft7bklb]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to