Wes McKinney created ARROW-8667:
-----------------------------------
Summary: [C++] Add multi-consumer Scheduler API to sit one layer
above ThreadPool
Key: ARROW-8667
URL: https://issues.apache.org/jira/browse/ARROW-8667
Project: Apache Arrow
Issue Type: New Feature
Components: C++
Reporter: Wes McKinney
Fix For: 1.0.0
I believe we should define an abstraction to allow for custom resource
allocation strategies (round robin, even time, etc.) to be devised for
situations where there are different thread pool consumers that are working
independently of each other.
Consider the classic nested parallelism scenario:
* Task A in thread 1 may issue N subtasks that run in parallel
* Task B in thread 2 may issue K subtasks
With our current ThreadPool abstraction, it is easy to conceive scenarios where
either Task A or Task B trample each other.
One approach to remedy this problem is to have an API like so:
{code}
// Inform the scheduler that you want to submit tasks that are "your tasks"
int consumer_id = scheduler->NewConsumer();
for (...) {
Future<T> fut = scheduler->Submit(consumer_id, DoWork, ...);
}
scheduler->FinishConsumer(consumer_id);
{code}
The idea is that the scheduler would maintain separate task queues for each
consumer and e.g. track consumer-specific metrics of interest to determine how
tasks are allocated.
The scheduler could have different logic to control tasks being assigned to
worker threads:
* Round-robin
* Even-time allocation (run fewer tasks for consumers with "slow" tasks and
more tasks from consumers with "fast" tasks -- though there are some nuances
here like avoiding starving a consumer if they've been doing a lot of "slow"
tasks and then a "fast" consumer shows up)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)