[ 
https://issues.apache.org/jira/browse/ARROW-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963437#comment-16963437
 ] 

Wes McKinney commented on ARROW-7001:
-------------------------------------

I think realistically we may need to support both styles of scheduling

To make the problem more concrete, I think we should focus on the needs of 
reading multiple files that are capable of parallelizing at the file level. For 
example, Parquet files. Not only do we parallelize at the column level, but 
within a thread reading a column, we perform IO calls which may block. Somehow 
we need to not block a CPU core on an IO call returning. Do we need to do a 
major refactoring? Or can the current code be retrofitted with a 
suspend-task-resume-task API that allows other tasks to be started when there 
is IO waiting. 

Facebook's Folly library has a fibers / coroutines library that may provide 
some ideas about what kind of programming models make sense for certain 
applications

https://github.com/facebook/folly/tree/master/folly/fibers

> [C++] Develop threading APIs to accommodate nested parallelism 
> ---------------------------------------------------------------
>
>                 Key: ARROW-7001
>                 URL: https://issues.apache.org/jira/browse/ARROW-7001
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>
> Tasks invoked in parallel may be able to submit their own subtasks, which in 
> OpenMP and TBB documentation is often called "nested parallelism". 
> If a task blocks on the completion of subtasks, then outright deadlocks are 
> possible -- running tasks are all blocking on their subtasks, but the thread 
> pool will not schedule any further tasks.
> I suggest that such code have a way to indicate to the thread pool (if one is 
> passed in) that it is blocking on the completion of other tasks so that 
> further tasks can be run while the task waits for its child tasks to 
> complete. One possible way to do this is to have a floating "soft limit" for 
> concurrent tasks that can be incremented when tasks are waiting. 
> So if we normally allow 8 concurrent tasks, then this can be temporarily 
> increased for each "suspended" task. Preferably we would provide some way for 
> the dependent task group to "awaken" the suspended task so that it does not 
> have to do any work while waiting for the task group to finish
> Note this feature can also be used in tasks that are waiting for IO calls



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to