[
https://issues.apache.org/jira/browse/ARROW-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394130#comment-17394130
]
Antoine Pitrou commented on ARROW-6854:
---------------------------------------
Does this still apply? cc [~westonpace]
> [Dataset][C++] RecordBatchProjector is not thread safe
> ------------------------------------------------------
>
> Key: ARROW-6854
> URL: https://issues.apache.org/jira/browse/ARROW-6854
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Francois Saint-Jacques
> Priority: Major
>
> While working on ARROW-6769 I noted that RecordbBatchProjector is not thread
> safe. My goal is to use this class to wrap the ScanTaskIterator in another
> ScanTaskIterator that projects, so producer (fragments) don't have to know
> about this schema. The issue is that ScanTask are expected to run on
> concurrent thread. The projector will be invoked by multiple thread.
> The lack of concurrency safety is due to adaptivity of input schemas and
> `SetInputSchema` stores in a local cache. I suggest we refactor into 2
> classes.
> # `RecordBatchProjector` which will work with a static `from` schema, i.e.
> no adaptivity. The schema is defined at construct time. This class is thread
> safe to invoke after construction since no local modification is done.
> # `AdaptiveRecordBatchProjector` which will have a cache map[schema_hash,
> std::shared_ptr<RecordBatchProjector>] protected with a mutex.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)