pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2954218719
> As time permits, we can explore alternate, more universal strategies for cancellation > 100% agree with not merging this until we are in agreement I can't help but feel that this is needlessly being rushed. Committing to a new public API on an extension point before it's clearly proven feels like a bad idea to me. If it was purely an implementation detail it would be less of an issue. What's the hurry? The more I've been digging into the code over the past few days the clearer it is that getting yielding just right while avoiding wasteful work is something you need to be careful about. See for instance #16196. We started out with concerns that yielding needlessly would introduce performance overhead, but now this PR does so even when it's not necessary at all. Admittedly it's nowhere near as extreme as the above issue, but still waste is waste. Wouldn't it be prudent to give this some more time to mature and maybe see if there are better strategies? It's not a universal solution, but just as an example what I found in #16319 is that restructuring the operator code a little bit makes them behave much nicer from the caller's perspective. Unfortunately you do need to do this kind of thing at the operator implementation level. I do think there are implementation patterns here that could server as building blocks for operators. 'Build a stream async and then emit from it' for instance seems to be pretty common. Rather than having a bespoke implementation in each operator it would be useful to have a combinator that operators can use. Perhaps there's a similar zero cost solution to the 'drains input before first emit' pattern as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org