ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2943759107
> @ozankabak I still don't think you need all this API work since there's a zero API change way to deal with cancellation already. Tests all pass with no API changes in the 'all Stream implementations must be well behaved Tokio citizens' approach. I understand the performance concern, but maybe it's a bit premature to design APIs before knowing what the actual performance impact is? In terms of code changes I don't think the complexity argument holds since the required code changes were fairly trivial. > I've found a dedicated machine to run the benchmarks on in the meantime. It's 10 year old hardware (Xeon E5620) so the compile times take forever, but should be good enough for relative comparisons. Will post results when I get them. @pepijnve, I respect your opinion but we will need to agree to disagree. After spending a lot of time (and writing a lot of upstream *and* downstream DF code) over the last few years for leveraging the async runtime and its challenges/advantages, issues related to pipeline-breaking, performance implications of these things, the APIs `ExecutionPlan` objects should provide, responsibilities of the planner vs. operators and others, my intuition tells me that: 1. There is some information the `ExecutionPlan` API doesn't expose yet about input pipelining behavior and propagation of pendings, and it should -- not just for this use case, but others too. 2. There is a way to solve this problem universally with optimally minimal overhead and it is not that hard to figure out. 3. This way will also help us reduce the responsibility of user-defined operators, and solve cancellability even without their strict cooperation. 4. There is a lot of downstream users who define user-defined operators, and any "win" (in the above sense) for such use cases is important for our project goals. 5. I suspect (and my confidence is somewhat lower on this point relative to others) we will always be able to construct some cases, however contrived, where an "everyone always yields" solution will suffer from performance problems. 6. My intuition could be wrong, and if this thesis (1-2-3) indeed turns out to be wrong, we can take the learnings and fall back to another solution, where your proposal would be a good candidate. I feel like we are getting close to a point where we start having not-so-fruitful discussions. I think I have made a good effort to make my arguments and reasoning clear. We will see where this effort goes (and I'm very hopeful that we will succeed), and if you want to help, we will always take it with appreciation. I think you can probably relate to the position that I would like to focus my thinking on getting 1-2-3 done (if possible) in the short-term, instead of repeatedly spending time on justifying what we are doing. If this route doesn't work, I will gladly help with finding another solution (and maybe that will your approach). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org