[ 
https://issues.apache.org/jira/browse/ARROW-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359098#comment-17359098
 ] 

Antoine Pitrou commented on ARROW-13004:
----------------------------------------

> The intent is to run Transform + Consume as a single thread task on each item 
> generated by source()

Then why not just chain them directly?

 

> [C++] Allow the creation of future "chains" to better control parallelism
> -------------------------------------------------------------------------
>
>                 Key: ARROW-13004
>                 URL: https://issues.apache.org/jira/browse/ARROW-13004
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> This is a bit tricky to explain.  ShouldSchedule::Always works well for 
> AddCallback but falls short for Transfer and Then.  An example may explain 
> best.
> Consider three operators, Source, Transform, and Sink.  They are setup as...
> {code:java}
> source_fut = source(); // 1
> transform_fut = source_fut.Then(Transform(), ScheduleAlways); // 2
> sink_fut = transform_fut.Then(Consume()); // 3
> {code}
> The intent is to run Transform + Consume as a single thread task on each item 
> generated by source().  This is what happens if source() is slow.  If 
> source() is fast (let's pretend it's always finished) then this is not what 
> happens.
> Line 2 causes a new thread task to be launched (since source_fut is 
> finished).  It is possible that new thread task can mark transform_fut 
> finished before line 3 is executed by the original thread.  This causes 
> Consume() and Transform() to run on separate threads.
> The solution (at least as best I can come up with) is unfortunately a little 
> complex (though the complexity can be hidden in future/async_generator 
> internals).  Basically, it is worth waiting to schedule until the future 
> chain has had a chance to finish connecting the pressure.  This means a 
> future created with ScheduleAlways is created in an "unconsumed" mode.  Any 
> callbacks that would normally be launched will not be launched until the 
> future switches to "consumed".  Future.Wait(), VisitAsyncGenerator, 
> CollectAsyncGenerator, and some of the async_generator operators would cause 
> the future to be "consumed".  The "consume" signal will need to propagate 
> backwards up the chain so futures will need to keep a reference to their 
> antecedent future.
> This work meshes well with some other improvements I have been considering, 
> in particular, splitting future/promise and restricting futures to a single 
> callback.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to