[
https://issues.apache.org/jira/browse/ARROW-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359536#comment-17359536
]
Weston Pace commented on ARROW-13004:
-------------------------------------
Some of the async_generator.h operators (e.g. the merge or resequencing
operators) aren't so easy to chain directly. That being said, I will hold off
for the moment on working on this until the ExecPlan work is more stable. It's
possible the scan node can just adopt the tools from ExecPlan and build up a
graph instead of using async_generator.h.
> [C++] Allow the creation of future "chains" to better control parallelism
> -------------------------------------------------------------------------
>
> Key: ARROW-13004
> URL: https://issues.apache.org/jira/browse/ARROW-13004
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Assignee: Weston Pace
> Priority: Major
>
> This is a bit tricky to explain. ShouldSchedule::Always works well for
> AddCallback but falls short for Transfer and Then. An example may explain
> best.
> Consider three operators, Source, Transform, and Sink. They are setup as...
> {code:java}
> source_fut = source(); // 1
> transform_fut = source_fut.Then(Transform(), ScheduleAlways); // 2
> sink_fut = transform_fut.Then(Consume()); // 3
> {code}
> The intent is to run Transform + Consume as a single thread task on each item
> generated by source(). This is what happens if source() is slow. If
> source() is fast (let's pretend it's always finished) then this is not what
> happens.
> Line 2 causes a new thread task to be launched (since source_fut is
> finished). It is possible that new thread task can mark transform_fut
> finished before line 3 is executed by the original thread. This causes
> Consume() and Transform() to run on separate threads.
> The solution (at least as best I can come up with) is unfortunately a little
> complex (though the complexity can be hidden in future/async_generator
> internals). Basically, it is worth waiting to schedule until the future
> chain has had a chance to finish connecting the pressure. This means a
> future created with ScheduleAlways is created in an "unconsumed" mode. Any
> callbacks that would normally be launched will not be launched until the
> future switches to "consumed". Future.Wait(), VisitAsyncGenerator,
> CollectAsyncGenerator, and some of the async_generator operators would cause
> the future to be "consumed". The "consume" signal will need to propagate
> backwards up the chain so futures will need to keep a reference to their
> antecedent future.
> This work meshes well with some other improvements I have been considering,
> in particular, splitting future/promise and restricting futures to a single
> callback.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)