[
https://issues.apache.org/jira/browse/ARROW-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496304#comment-17496304
]
Weston Pace commented on ARROW-15741:
-------------------------------------
I agree that we should not allow concurrent calls. I don't think we lose any
functionality for this.
A source may be able to be truly accessed in parallel and it may not. For an
example of the negative case, if the stream is backed by a TCP socket then
calling get_next multiple times is probably useless (we will just block the
threads waiting on the socket anyways).
On the other hand, if the stream is a dataset scan on an S3 data source then
the source can (and should) be accessing multiple batches of data at once.
Instead of allowing multiple calls to get_next the producer should just be
configured (external to the stream API) to allow some amount of readahead and
the producer should buffer internally.
Note that my opinion is probably heavily influenced by the fact that this is
how we handle this problem today.
> [Format] Clarify thread safety of C stream interface
> ----------------------------------------------------
>
> Key: ARROW-15741
> URL: https://issues.apache.org/jira/browse/ARROW-15741
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Format
> Reporter: Jorge Leitão
> Priority: Major
>
> The C stream interface has the method `get_next` that mutates the producer
> side.
> We do not mention whether there is some kind of thread safety associated to
> such interface. For example, can the interface be shared between two threads
> and both call get_next at the same time?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)