pepijnve commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2969854461
> fixes any bugs / adds features over the current one, > Is "just" cleaner way to implement the same thing (this is also a fine thing to contribute as well). There are a couple of benefits. It removes the edge case seen in the interleave operator (or any `select!` style code in general). With the current per stream counter, one stream might want to yield, but the parent stream may decide to poll another stream in response which happens to be ready. The end result is that two cooperating streams may turn into a non-cooperating when they are merged. To fix this, you would need to adjust the merging operator as well and we're basically back where we started. If all cooperating streams use the same budget, then this problem goes away. Once the yield point has been hit, all cooperating streams will yield. Using the task budget also avoids the 'redundant yield' problem in the current version. If you now do a simple `SELECT * FROM ...` query, by default you'll get a `Pending` after every 64 `Ready(RecordBatch)`. With the task budget you will only actually inject the `Pending` when it's actually necessary. The system automatically does the right thing. Finally it aligns the cooperative yielding strategy across the library. `RecordBatchReceiverStream` is implicitly already using this strategy in a way you cannot opt out of. It's better to have one consist way of solving this cancellation problem once and for all. I have a patch almost ready. I'll make a draft PR already so this all becomes a bit more tangible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org