[jira] [Commented] (ARROW-4283) [Python] Should RecordBatchStreamReader/Writer be AsyncIterable?

Jira Mon, 05 Dec 2022 00:46:05 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643162#comment-17643162
 ]


Rémi Dettai commented on ARROW-4283:
------------------------------------

Hi! I would like to revive this thread. We have a similar usecase where we need 
an async interface in PyArrow for IPC streams.

> RecordBatchStreamReader: this is too high-level; you need to read from your 
> data source in Python (using `await something.read()`) then construct a 
> record batch out of the data (perhaps with a BufferReader)

To be able to construct the record batch, we need to know how much bytes we 
need to read. Getting that information implies:
* reading the metadata size
* parsing the metadata
* getting the body size from the parsed metadata

*The big issue here is that PyArrow doesn't seem to expose the right primitives 
for that, in particular parsing the metadata.*

I believe that asyncio is quickly gaining in popularity, and Arrow being an 
exchange format, it will end up being used in a lot of use cases like the one 
mentioned by [~paul.e.taylor] where async is very valuable.

> [Python] Should RecordBatchStreamReader/Writer be AsyncIterable?
> ----------------------------------------------------------------
>
>                 Key: ARROW-4283
>                 URL: https://issues.apache.org/jira/browse/ARROW-4283
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Paul Taylor
>            Priority: Minor
>
> Filing this issue after a discussion today with [~xhochy] about how to 
> implement streaming pyarrow http services. I had attempted to use both Flask 
> and [aiohttp|https://aiohttp.readthedocs.io/en/stable/streams.html]'s 
> streaming interfaces because they seemed familiar, but no dice. I have no 
> idea how hard this would be to add -- supporting all the asynciterable 
> primitives in JS was non-trivial.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-4283) [Python] Should RecordBatchStreamReader/Writer be AsyncIterable?

Reply via email to