[ 
https://issues.apache.org/jira/browse/ARROW-12560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333518#comment-17333518
 ] 

Weston Pace commented on ARROW-12560:
-------------------------------------

Yes, a `TransferAlways` would work.  I tried a few iterations but they didn't 
work as intended.  The thread task has to be spawned by the consumer in this 
case instead of the producer.  One way it could work is by having `Transfer` 
"mark" the future in some way so that callbacks added to the future are always 
spawned as new thread tasks.

The utility could be more generally used outside of transfer (e.g. it could be 
used with an expensive map function to get a partitioned fan-out) but the 
synchronous utilities we have (e.g. TaskGroup) could achieve the same thing in 
those cases.

> [C++] Investigate utilizing aggressive thread task creation when adding 
> callback to finished future
> ---------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-12560
>                 URL: https://issues.apache.org/jira/browse/ARROW-12560
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: async-util
>
> Imagine there is a slow map function (that could run in parallel) and a 
> vector generator given a long vector of tasks.  If we apply map to the 
> generator and then readahead we won't actually get any parallelism because 
> the vector generator returns everything synchronously and so no thread task 
> will ever be submitted.
> This hypothetical situation is a reality in some situations in the scanner.  
> For example, if scanning CSV files and the CPU threads fall behind the I/O 
> threads then all callbacks will be synchronous (since the futures will 
> already have been completed by the I/O threads).
> In such a situation we might benefit from creating a new thread task even 
> though we wouldn't normally create one.  For example, if we have an idle 
> core.  You can think of this as an analogue of work stealing.
> On the other hand, creating new thread tasks at any random callback might not 
> be the most efficient. We could mitigate this by marking a callback as 
> "potentially long" as some kind of hint when we add the callback to indicate 
> it as eligible for eager thread creation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to