alamb commented on PR #19360:
URL: https://github.com/apache/datafusion/pull/19360#issuecomment-3739273608

   I am just coming back to this issue, as it is affecting some of our 
customers (who issue some big query, often by accident, and then the fact the 
queries don't cancel ties up a non trivial amount of their resources). We have 
deployed this patch into production and will report back on how it works.
   
   The full issue is explained quite well in this blog (if I immodestly say so 
myself): https://datafusion.apache.org/blog/2025/06/30/cancellation/
   
   > So in this example, what's the appropriate fix to achieve that? There are 
really only two options:
   
   in my mind, the two options map to 
   1. **HAVE EACH OPERATOR CHECK AT OUTPUT** ensure every possible input value 
returns Poll::Pending periodically
   2. **HAVE EACH OPERATOR CHECK AT INPUT** adapt the drop_all function so that 
it returns Poll::Pending periodically itself
   
   I think this PR effectively starts us down the path of 1 (checking at the 
output of each operator)
   
   The more I think about this, the more I think I understand that @pepijnve is 
suggesting that that we should add the yield checking to the aggregation (and 
other stream draining operators)
   
   > The alternative of changing the aggregation operator (and possibly other 
stream draining operators) would be option 2.
   
   I will look into this idea (checking the budget at input for operators) and 
see what it might look like as well as triyng to make the cancellation 
suggestions in 
https://github.com/apache/datafusion/blob/3a41cc6078ba71821d3ccceb944e9a5eee16774e/datafusion/physical-plan/src/execution_plan.rs#L306-L319
 more specific
   
   BTW as an aside, I am going to talk about our use of Tokio in DataFusion at 
the Tokio conference in April and this is definitely one of the items I will 
discuss
   - https://github.com/apache/datafusion/issues/19770


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to