[
https://issues.apache.org/jira/browse/ARROW-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ben Kietzman updated ARROW-10014:
---------------------------------
Fix Version/s: (was: 3.0.0)
4.0.0
> [C++] TaskGroup::Finish should execute tasks
> --------------------------------------------
>
> Key: ARROW-10014
> URL: https://issues.apache.org/jira/browse/ARROW-10014
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 1.0.1
> Reporter: Ben Kietzman
> Assignee: Ben Kietzman
> Priority: Major
> Fix For: 4.0.0
>
>
> Currently ThreadedTaskGroup::Finish locks the current thread waiting for
> worker threads to execute tasks. Instead it could pop tasks from the queue
> and execute them, using the Finish-ing thread as a worker. This would enable
> basic nested parallelism cases using TaskGroup::MakeSubGroup() without danger
> of accumulating a thread deadlock.
> For example in the case of reading multiple parquet files we would like to
> parallelize both across files to read and across columns within each file. We
> could support this basic nested parallelism by rewriting ParquetFileReader
> accept any TaskGroup across which to scatter its column reading tasks (rather
> than instantiating its own ThreadPool based on a boolean flag). Then file
> reading tasks could be scattered across a ThreadedTaskGroup, each of these
> creating a subgroup which runs all column reading tasks.
> However the above would currently deadlock for reading {{(# files) * (#
> columns) >= (# threads)}}, since every task of the root TaskGroup will be
> locked by its subgroup's call to Finish. In order to use
> TaskGroup::MakeSubGroup for basic nested parallelism, the Finish-ing thread
> must perform work in addition to checking for group completion.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)