[jira] [Commented] (ARROW-4283) [Python] Should RecordBatchStreamReader/Writer be AsyncIterable?

Weston Pace (Jira) Fri, 09 Dec 2022 08:59:07 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645401#comment-17645401
 ]


Weston Pace commented on ARROW-4283:
------------------------------------

Also, a note on {{RecordBatchStreamWriter}}.  In most cases you can probably 
get away with not making this async.  If you are writing to disk then the write 
is, typically, implicitly async.  The write function merely does a memcpy from 
user space to kernel space (into the page cache), marks the page dirty, and 
then immediately returns (without waiting for the data to be persisted to 
disk).  The only time this is blocking is if you are out of memory and swapping 
in which case you might have to wait for some physical memory to become 
available.  We do run into this on large datasets though and are currently 
investigating a direct I/O alternative which would, unfortunately, require 
async.  So that would be the exception to "in most cases".

Cloud filesystems behave similarly (well, I'm quite certain Arrow's S3 writer 
is implicitly async and others should be able to be so) where we create an S3 
request and then submit that request to an I/O thread under the hood and simply 
have a non-blocking write method.

> [Python] Should RecordBatchStreamReader/Writer be AsyncIterable?
> ----------------------------------------------------------------
>
>                 Key: ARROW-4283
>                 URL: https://issues.apache.org/jira/browse/ARROW-4283
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Paul Taylor
>            Priority: Minor
>
> Filing this issue after a discussion today with [~xhochy] about how to 
> implement streaming pyarrow http services. I had attempted to use both Flask 
> and [aiohttp|https://aiohttp.readthedocs.io/en/stable/streams.html]'s 
> streaming interfaces because they seemed familiar, but no dice. I have no 
> idea how hard this would be to add -- supporting all the asynciterable 
> primitives in JS was non-trivial.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-4283) [Python] Should RecordBatchStreamReader/Writer be AsyncIterable?

Reply via email to