[ 
https://issues.apache.org/jira/browse/ARROW-10014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208391#comment-17208391
 ] 

Weston Pace commented on ARROW-10014:
-------------------------------------

I'm going to continue from the email discussion and investigation and have 
added sub tasks for my planned approach.  It's a slightly different approach 
than the one laid out in the description (instead of Finish running tasks the 
FinishAsync method will be added which just returns immediately and gets off 
the thread pool).  If anyone wants me to open a new issue for my alternate 
approach instead of taking over this one please let me know.

> [C++] TaskGroup::Finish should execute tasks
> --------------------------------------------
>
>                 Key: ARROW-10014
>                 URL: https://issues.apache.org/jira/browse/ARROW-10014
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 1.0.1
>            Reporter: Ben Kietzman
>            Assignee: Ben Kietzman
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Currently ThreadedTaskGroup::Finish locks the current thread waiting for 
> worker threads to execute tasks. Instead it could pop tasks from the queue 
> and execute them, using the Finish-ing thread as a worker. This would enable 
> basic nested parallelism cases using TaskGroup::MakeSubGroup() without danger 
> of accumulating a thread deadlock.
> For example in the case of reading multiple parquet files we would like to 
> parallelize both across files to read and across columns within each file. We 
> could support this basic nested parallelism by rewriting ParquetFileReader 
> accept any TaskGroup across which to scatter its column reading tasks (rather 
> than instantiating its own ThreadPool based on a boolean flag). Then file 
> reading tasks could be scattered across a ThreadedTaskGroup, each of these 
> creating a subgroup which runs all column reading tasks.
> However the above would currently deadlock for reading {{(# files) * (# 
> columns) >= (# threads)}}, since every task of the root TaskGroup will be 
> locked by its subgroup's call to Finish. In order to use 
> TaskGroup::MakeSubGroup for basic nested parallelism, the Finish-ing thread 
> must perform work in addition to checking for group completion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to