pepijnve commented on PR #19360: URL: https://github.com/apache/datafusion/pull/19360#issuecomment-3667018953
> @pepijnve would your proposal be to add the appropriate consume_budget calls in group by hash instead? That's a tricky question because I don't think there's a clear, 'best' choice. A very quick recap first for the people who weren't involved in this work. Here's a quick and dirty adaptation of an example from the [Tokio docs](https://docs.rs/tokio/latest/tokio/task/coop/index.html#cooperative-scheduling) that resembles the type of code you typically see in DataFusion: ``` fn drop_all(mut input: Pin<&mut (dyn RecordBatchStream + Send)>, cx: &mut Context<'_>) -> Poll<()> { while let Some(_) = ready!(input.poll_next_unpin(cx)) {} Poll::Ready(()) } ``` In Tokio, a simple loop like that can be problematic. If every `input.poll_next_unpin()` invocation returns `Poll::Ready`, then the while loop will keep on going until the stream is completely consumed. The effect you see then is that the task that's calling `drop_all` is not cancelable. The solution to this is to ensure you periodically "yield to the runtime". The way to do that is to ensure `drop_all` breaks out of the loop every now and then by returning `Poll::Pending`. So in this example, what's the appropriate fix to achieve that? There are really only two options: 1. ensure every possible `input` value returns `Poll::Pending` periodically 2. adapt the `drop_all` function so that it returns `Poll::Pending` periodically itself If we map this onto one of the problematic queries from the reproduction case ``` SELECT sum(t1.v + t2.v) FROM t1, t2 ``` the loop here is introduced by the `sum` aggregation. If you remove it from the query, then the query cancels just fine. The fix we're discussing so far in this PR is option 1 described above. The alternative of changing the aggregation operator (and possibly other stream draining operators) would be option 2. The downside of both options is that if we sprinkle task budget consumption around the codebase that task will yield more frequently which may introduce performance regressions. The question then is which approach leads to the least amount of redundant yielding. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
