Ben Kietzman created ARROW-10014:
------------------------------------

             Summary: [C++] TaskGroup::Finish should execute tasks
                 Key: ARROW-10014
                 URL: https://issues.apache.org/jira/browse/ARROW-10014
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
    Affects Versions: 1.0.1
            Reporter: Ben Kietzman
            Assignee: Ben Kietzman
             Fix For: 2.0.0


Currently ThreadedTaskGroup::Finish locks the current thread waiting for worker 
threads to execute tasks. Instead it could pop tasks from the queue and execute 
them, using the Finish-ing thread as a worker. This would enable basic nested 
parallelism cases using TaskGroup::MakeSubGroup() without danger of 
accumulating a thread deadlock.

For example in the case of reading multiple parquet files we would like to 
parallelize both across files to read and across columns within each file. We 
could support this basic nested parallelism by rewriting ParquetFileReader 
accept any TaskGroup across which to scatter its column reading tasks (rather 
than instantiating its own ThreadPool based on a boolean flag). Then file 
reading tasks could be scattered across a ThreadedTaskGroup, each of these 
creating a subgroup which runs all column reading tasks.

However the above would currently deadlock for reading {{(# files) * (# 
columns) >= (# threads)}}, since every task of the root TaskGroup will be 
locked by its subgroup's call to Finish. In order to use 
TaskGroup::MakeSubGroup for basic nested parallelism, the Finish-ing thread 
must perform work in addition to checking for group completion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to