[
https://issues.apache.org/jira/browse/ARROW-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394273#comment-17394273
]
Weston Pace commented on ARROW-6854:
------------------------------------
The projection is now an execution of a scalar kernel which should be stateless
(and threadsafe). [~bkietz] (sorry to chain CC). We do have some tests which
run multiple scan tasks in parallel although I don't know if they use complex
projections.
> [Dataset][C++] RecordBatchProjector is not thread safe
> ------------------------------------------------------
>
> Key: ARROW-6854
> URL: https://issues.apache.org/jira/browse/ARROW-6854
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Francois Saint-Jacques
> Priority: Major
>
> While working on ARROW-6769 I noted that RecordbBatchProjector is not thread
> safe. My goal is to use this class to wrap the ScanTaskIterator in another
> ScanTaskIterator that projects, so producer (fragments) don't have to know
> about this schema. The issue is that ScanTask are expected to run on
> concurrent thread. The projector will be invoked by multiple thread.
> The lack of concurrency safety is due to adaptivity of input schemas and
> `SetInputSchema` stores in a local cache. I suggest we refactor into 2
> classes.
> # `RecordBatchProjector` which will work with a static `from` schema, i.e.
> no adaptivity. The schema is defined at construct time. This class is thread
> safe to invoke after construction since no local modification is done.
> # `AdaptiveRecordBatchProjector` which will have a cache map[schema_hash,
> std::shared_ptr<RecordBatchProjector>] protected with a mutex.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)