Francois Saint-Jacques updated ARROW-6854:
    Summary: [Dataset][C++] RecordBatchProjector is not thread safe  (was: 
[Dataset] RecordBatchProjector is not thread safe)

> [Dataset][C++] RecordBatchProjector is not thread safe
> ------------------------------------------------------
>                 Key: ARROW-6854
>                 URL: https://issues.apache.org/jira/browse/ARROW-6854
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Francois Saint-Jacques
>            Priority: Major
> While working on ARROW-6769 I noted that RecordbBatchProjector is not thread 
> safe. My goal is to use this class to wrap the ScanTaskIterator in another 
> ScanTaskIterator that projects, so producer (fragments) don't have to know 
> about this schema. The issue is that ScanTask are expected to run on 
> concurrent thread. The projector will be invoked by multiple thread.
> The lack of concurrency safety is due to adaptivity of input schemas and 
> `SetInputSchema` stores in a local cache. I suggest we refactor into 2 
> classes. 
>  # `RecordBatchProjector` which will work with a static `from` schema, i.e. 
> no adaptivity. The schema is defined at construct time. This class is thread 
> safe to invoke after construction since no local modification is done.
>  # `AdaptiveRecordBatchProjector` which will have a cache map[schema_hash, 
> std::shared_ptr<RecordBatchProjector>] protected with a mutex. 

This message was sent by Atlassian Jira

Reply via email to