alamb commented on PR #19360: URL: https://github.com/apache/datafusion/pull/19360#issuecomment-3739273608
I am just coming back to this issue, as it is affecting some of our customers (who issue some big query, often by accident, and then the fact the queries don't cancel ties up a non trivial amount of their resources). We have deployed this patch into production and will report back on how it works. The full issue is explained quite well in this blog (if I immodestly say so myself): https://datafusion.apache.org/blog/2025/06/30/cancellation/ > So in this example, what's the appropriate fix to achieve that? There are really only two options: in my mind, the two options map to 1. **HAVE EACH OPERATOR CHECK AT OUTPUT** ensure every possible input value returns Poll::Pending periodically 2. **HAVE EACH OPERATOR CHECK AT INPUT** adapt the drop_all function so that it returns Poll::Pending periodically itself I think this PR effectively starts us down the path of 1 (checking at the output of each operator) The more I think about this, the more I think I understand that @pepijnve is suggesting that that we should add the yield checking to the aggregation (and other stream draining operators) > The alternative of changing the aggregation operator (and possibly other stream draining operators) would be option 2. I will look into this idea (checking the budget at input for operators) and see what it might look like as well as triyng to make the cancellation suggestions in https://github.com/apache/datafusion/blob/3a41cc6078ba71821d3ccceb944e9a5eee16774e/datafusion/physical-plan/src/execution_plan.rs#L306-L319 more specific BTW as an aside, I am going to talk about our use of Tokio in DataFusion at the Tokio conference in April and this is definitely one of the items I will discuss - https://github.com/apache/datafusion/issues/19770 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
