Re: [PR] fix: Wrap join operators with make_cooperative() for cancellation support [datafusion]

via GitHub Wed, 17 Dec 2025 12:23:23 -0800


pepijnve commented on PR #19360:
URL: https://github.com/apache/datafusion/pull/19360#issuecomment-3667018953


   > @pepijnve would your proposal be to add the appropriate consume_budget 
calls in group by hash instead?
   
   That's a tricky question because I don't think there's a clear, 'best' 
choice.
   
   A very quick recap first for the people who weren't involved in this work. 
Here's a quick and dirty adaptation of an example from the [Tokio 
docs](https://docs.rs/tokio/latest/tokio/task/coop/index.html#cooperative-scheduling)
 that resembles the type of code you typically see in DataFusion:
   
   ```
   fn drop_all(mut input: Pin<&mut (dyn RecordBatchStream + Send)>, cx: &mut 
Context<'_>) -> Poll<()> {
       while let Some(_) = ready!(input.poll_next_unpin(cx)) {}
       Poll::Ready(())
   }
   ```
   
   In Tokio, a simple loop like that can be problematic. If every 
`input.poll_next_unpin()` invocation returns `Poll::Ready`, then the while loop 
will keep on going until the stream is completely consumed. The effect you see 
then is that the task that's calling `drop_all` is not cancelable.
   
   The solution to this is to ensure you periodically "yield to the runtime". 
The way to do that is to ensure `drop_all` breaks out of the loop every now and 
then by returning `Poll::Pending`.
   
   So in this example, what's the appropriate fix to achieve that? There are 
really only two options:
   1. ensure every possible `input` value returns `Poll::Pending` periodically
   2. adapt the `drop_all` function so that it returns `Poll::Pending` 
periodically itself
   
   If we map this onto one of the problematic queries from the reproduction case
   
   ```
   SELECT sum(t1.v + t2.v) FROM t1, t2
   ```
   
   the loop here is introduced by the `sum` aggregation. If you remove it from 
the query, then the query cancels just fine. The fix we're discussing so far in 
this PR is option 1 described above. The alternative of changing the 
aggregation operator (and possibly other stream draining operators) would be 
option 2.
   
   The downside of both options is that if we sprinkle task budget consumption 
around the codebase that task will yield more frequently which may introduce 
performance regressions. The question then is which approach leads to the least 
amount of redundant yielding.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: Wrap join operators with make_cooperative() for cancellation support [datafusion]

Reply via email to