[ 
https://issues.apache.org/jira/browse/ARROW-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated ARROW-6854:
------------------------------------------
    Summary: [Dataset][C++] RecordBatchProjector is not thread safe  (was: 
[Dataset] RecordBatchProjector is not thread safe)

> [Dataset][C++] RecordBatchProjector is not thread safe
> ------------------------------------------------------
>
>                 Key: ARROW-6854
>                 URL: https://issues.apache.org/jira/browse/ARROW-6854
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Francois Saint-Jacques
>            Priority: Major
>
> While working on ARROW-6769 I noted that RecordbBatchProjector is not thread 
> safe. My goal is to use this class to wrap the ScanTaskIterator in another 
> ScanTaskIterator that projects, so producer (fragments) don't have to know 
> about this schema. The issue is that ScanTask are expected to run on 
> concurrent thread. The projector will be invoked by multiple thread.
> The lack of concurrency safety is due to adaptivity of input schemas and 
> `SetInputSchema` stores in a local cache. I suggest we refactor into 2 
> classes. 
>  # `RecordBatchProjector` which will work with a static `from` schema, i.e. 
> no adaptivity. The schema is defined at construct time. This class is thread 
> safe to invoke after construction since no local modification is done.
>  # `AdaptiveRecordBatchProjector` which will have a cache map[schema_hash, 
> std::shared_ptr<RecordBatchProjector>] protected with a mutex. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to