Weston Pace created ARROW-12560:
-----------------------------------
Summary: [C++] Investigate excessive thread creation when adding
callback to finished future.
Key: ARROW-12560
URL: https://issues.apache.org/jira/browse/ARROW-12560
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
Assignee: Weston Pace
Imagine there is a slow map function (that could run in parallel) and a vector
generator given a long vector of tasks. If we apply map to the generator and
then readahead we won't actually get any parallelism because the vector
generator returns everything synchronously and so no thread task will ever be
submitted.
This hypothetical situation is a reality in some situations in the scanner.
For example, if scanning CSV files and the CPU threads fall behind the I/O
threads then all callbacks will be synchronous (since the futures will already
have been completed by the I/O threads).
In such a situation we might benefit from creating a new thread task even
though we wouldn't normally create one. For example, if we have an idle core.
You can think of this as an analogue of work stealing.
On the other hand, creating new thread tasks at any random callback might not
be the most efficient. We could mitigate this by marking a callback as
"potentially long" as some kind of hint when we add the callback to indicate it
as eligible for eager thread creation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)