> 4) control is not returned to the processing thread Yes, it looks like the current implementation does not return control to the processing thread, but I think this is correct, or at least "as designed". The thread will be used to continue iterating the source.
> control is not returned to the processing thread, and instead > blocks when > marking the backpressure_future_ as finished. As Sasha said, the call to "MarkFinished" will then run callbacks. One of those callbacks (the only one in this case) then continues to iterate from the source, doing the work that was originally started by the call to StartProducing. Generally, the code will only go so far before it creates a new thread task and then control will eventually return. However, if you are running without an executor, then there are no thread tasks, all callbacks run immediately in the thread calling mark finished, and it can be rather hard to understand the logic. I'll try and draw up some sequence diagrams as an example for this and Li Jin's earlier question regarding ordering. > Anyway, I wouldn’t currently rely on PauseProducing/ResumeProducing > unfortunately. It is currently not tested anywhere as far as I can tell and > ignored by a lot of nodes (such as HashJoinNode). We do test, and rely, on the PauseProducing/ResumeProducing mechanics to implement back-pressure for the datasets API. This limits plans to scan->filter->project->sink and all of these nodes have been tested to accurately work with backpressure. I think you're free to experiment with it. I agree however, that backpressure could maybe be a more minor concern until some of the scheduler improvements are available. On Wed, Jul 20, 2022 at 3:13 PM Sasha Krassovsky <krassovskysa...@gmail.com> wrote: > > Hi, > Futures run callbacks on the thread that marks then as finished. It seems > that inside of the Source node’s generator loop does add a callback > (https://github.com/iChauster/arrow/blob/asof_join2/cpp/src/arrow/compute/exec/source_node.cc#L130 > > <https://github.com/iChauster/arrow/blob/asof_join2/cpp/src/arrow/compute/exec/source_node.cc#L130>) > which continues the loop. I’m not entirely sure myself how this code works > (this generator + control flow thing is very opaque), but my guess is that’s > what’s causing it. > > One further note, copying a Future<> actually maintains a reference to the > same underlying future, which may also be unexpected at first. Specifically > in your code, doing Future<> to_finish = backpressure_future_; > to_finish.MarkFinished(); is equivalent to just > backpressure_future_.MarkFinished(). > > Anyway, I wouldn’t currently rely on PauseProducing/ResumeProducing > unfortunately. It is currently not tested anywhere as far as I can tell and > ignored by a lot of nodes (such as HashJoinNode). Michal and I have some work > in progress involving a new scheduler with first-class support for back > pressure. > > Sasha > > > On Jul 20, 2022, at 1:49 PM, Ivan Chau <ivan.m.c...@gmail.com> wrote: > > > > backpressure_future_ >