[
https://issues.apache.org/jira/browse/ARROW-10117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352944#comment-17352944
]
Weston Pace commented on ARROW-10117:
-------------------------------------
One interesting thing to note as my work on this is progressing is that the
difference between how tasks are scheduled (ARROW-12903) is having a greater
and greater impact on performance. As an example, in my latest benchmarks, a
tiny reference workload runs at ~2 million tasks per second. With 8 threads
and poor scheduling we end up with 620k tasks per second (this is similar to
the baseline performance). With 8 threads and ideal scheduling we end up with
7-8 million tasks per second.
It's possible we can find some ways to better handle the worst-case scenario
but it's not something I'm going to be tackling as part of this PR.
> [C++] Implement work-stealing scheduler / multiple queues in ThreadPool
> -----------------------------------------------------------------------
>
> Key: ARROW-10117
> URL: https://issues.apache.org/jira/browse/ARROW-10117
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Wes McKinney
> Assignee: Weston Pace
> Priority: Major
>
> This involves a change from a single task queue shared amongst all threads to
> a per-thread task queue and the ability for idle threads to take tasks from
> other threads' queues (work stealing).
> As part of this, the task submission API would need to be evolved in some
> fashion to allow for tasks related to a particular workload to end up in the
> same task queue
--
This message was sent by Atlassian Jira
(v8.3.4#803005)