Francois Saint-Jacques created ARROW-6854:
---------------------------------------------
Summary: [Dataset] RecordBatchProjector is not thread safe
Key: ARROW-6854
URL: https://issues.apache.org/jira/browse/ARROW-6854
Project: Apache Arrow
Issue Type: Bug
Components: C++
Reporter: Francois Saint-Jacques
While working on ARROW-6769 I noted that RecordbBatchProjector is not thread
safe. My goal is to use this class to wrap the ScanTaskIterator in another
ScanTaskIterator that projects, so producer (fragments) don't have to know
about this schema. The issue is that ScanTask are expected to run on concurrent
thread. The projector will be invoked by multiple thread.
The lack of concurrency safety is due to adaptivity of input schemas and
`SetInputSchema` stores in a local cache. I suggest we refactor into 2 classes.
# `RecordBatchProjector` which will work with a static `from` schema, i.e. no
adaptivity. The schema is defined at construct time. This class is thread safe
to invoke after construction since no local modification is done.
# `AdaptiveRecordBatchProjector` which will have a cache map[schema_hash,
std::shared_ptr<RecordBatchProjector>] protected with a mutex.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)