[
https://issues.apache.org/jira/browse/ARROW-12560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333037#comment-17333037
]
Antoine Pitrou commented on ARROW-12560:
----------------------------------------
I don't understand. If you want parallelism, why not use the CPU thread pool?
> [C++] Investigate excessive thread creation when adding callback to finished
> future.
> ------------------------------------------------------------------------------------
>
> Key: ARROW-12560
> URL: https://issues.apache.org/jira/browse/ARROW-12560
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Assignee: Weston Pace
> Priority: Major
> Labels: async-util
>
> Imagine there is a slow map function (that could run in parallel) and a
> vector generator given a long vector of tasks. If we apply map to the
> generator and then readahead we won't actually get any parallelism because
> the vector generator returns everything synchronously and so no thread task
> will ever be submitted.
> This hypothetical situation is a reality in some situations in the scanner.
> For example, if scanning CSV files and the CPU threads fall behind the I/O
> threads then all callbacks will be synchronous (since the futures will
> already have been completed by the I/O threads).
> In such a situation we might benefit from creating a new thread task even
> though we wouldn't normally create one. For example, if we have an idle
> core. You can think of this as an analogue of work stealing.
> On the other hand, creating new thread tasks at any random callback might not
> be the most efficient. We could mitigate this by marking a callback as
> "potentially long" as some kind of hint when we add the callback to indicate
> it as eligible for eager thread creation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)