[ 
https://issues.apache.org/jira/browse/ARROW-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496304#comment-17496304
 ] 

Weston Pace commented on ARROW-15741:
-------------------------------------

I agree that we should not allow concurrent calls.  I don't think we lose any 
functionality for this.

A source may be able to be truly accessed in parallel and it may not.  For an 
example of the negative case, if the stream is backed by a TCP socket then 
calling get_next multiple times is probably useless (we will just block the 
threads waiting on the socket anyways).

On the other hand, if the stream is a dataset scan on an S3 data source then 
the source can (and should) be accessing multiple batches of data at once.  
Instead of allowing multiple calls to get_next the producer should just be 
configured (external to the stream API) to allow some amount of readahead and 
the producer should buffer internally.

Note that my opinion is probably heavily influenced by the fact that this is 
how we handle this problem today.

> [Format] Clarify thread safety of C stream interface
> ----------------------------------------------------
>
>                 Key: ARROW-15741
>                 URL: https://issues.apache.org/jira/browse/ARROW-15741
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Format
>            Reporter: Jorge Leitão
>            Priority: Major
>
> The C stream interface has the method `get_next` that mutates the producer 
> side.
> We do not mention whether there is some kind of thread safety associated to 
> such interface. For example, can the interface be shared between two threads 
> and both call get_next at the same time?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to