pepijnve commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2954218719

   > As time permits, we can explore alternate, more universal strategies for 
cancellation
   > 100% agree with not merging this until we are in agreement
   
   I can't help but feel that this is needlessly being rushed. Committing to a 
new public API on an extension point before it's clearly proven feels like a 
bad idea to me. If it was purely an implementation detail it would be less of 
an issue. What's the hurry?
   
   The more I've been digging into the code over the past few days the clearer 
it is that getting yielding just right while avoiding wasteful work is 
something you need to be careful about. See for instance #16196.
   We started out with concerns that yielding needlessly would introduce 
performance overhead, but now this PR does so even when it's not necessary at 
all. Admittedly it's nowhere near as extreme as the above issue, but still 
waste is waste.
   
   Wouldn't it be prudent to give this some more time to mature and maybe see 
if there are better strategies? It's not a universal solution, but just as an 
example what I found in #16319 is that restructuring the operator code a little 
bit makes them behave much nicer from the caller's perspective. Unfortunately 
you do need to do this kind of thing at the operator implementation level. I do 
think there are implementation patterns here that could server as building 
blocks for operators. 'Build a stream async and then emit from it' for instance 
seems to be pretty common. Rather than having a bespoke implementation in each 
operator it would be useful to have a combinator that operators can use. 
Perhaps there's a similar zero cost solution to the 'drains input before first 
emit' pattern as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to