[ 
https://issues.apache.org/jira/browse/ARROW-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-10014:
---------------------------------
    Fix Version/s:     (was: 3.0.0)
                   4.0.0

> [C++] TaskGroup::Finish should execute tasks
> --------------------------------------------
>
>                 Key: ARROW-10014
>                 URL: https://issues.apache.org/jira/browse/ARROW-10014
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 1.0.1
>            Reporter: Ben Kietzman
>            Assignee: Ben Kietzman
>            Priority: Major
>             Fix For: 4.0.0
>
>
> Currently ThreadedTaskGroup::Finish locks the current thread waiting for 
> worker threads to execute tasks. Instead it could pop tasks from the queue 
> and execute them, using the Finish-ing thread as a worker. This would enable 
> basic nested parallelism cases using TaskGroup::MakeSubGroup() without danger 
> of accumulating a thread deadlock.
> For example in the case of reading multiple parquet files we would like to 
> parallelize both across files to read and across columns within each file. We 
> could support this basic nested parallelism by rewriting ParquetFileReader 
> accept any TaskGroup across which to scatter its column reading tasks (rather 
> than instantiating its own ThreadPool based on a boolean flag). Then file 
> reading tasks could be scattered across a ThreadedTaskGroup, each of these 
> creating a subgroup which runs all column reading tasks.
> However the above would currently deadlock for reading {{(# files) * (# 
> columns) >= (# threads)}}, since every task of the root TaskGroup will be 
> locked by its subgroup's call to Finish. In order to use 
> TaskGroup::MakeSubGroup for basic nested parallelism, the Finish-ing thread 
> must perform work in addition to checking for group completion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to