pepijnve commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2963507764
@alamb our messages crossed paths in the ether. > I have somewhat lost track of what exactly you are proposing. Sorry, I usually do this around a white board. First and foremost I'm trying to communicate what I see as intrinsic problems with the external optimizer rule approach. This was countered with a belief that it can be solved, but I don't see it yet myself. Happy to be proven wrong of course. I've explained the problem to 3 or 4 colleagues in the meantime as unbiased as I possibly could and they basically all tell me the same. >Is it the approach in one of these PRs: > https://github.com/apache/datafusion/pull/16301 > https://github.com/apache/datafusion/pull/16319 A bit of both :) The rationale behind #16319 is twofold. - By using a JoinHandle you naturally solve the cancellation problem. A JoinHandle is pending once, and wakes when the result is ready. The caller is unblocked immediately and can `select!` on a timeout. The subtle effect of it is that you shift from a cancellation problem for the caller to an abort problem for the spawned task. Aborting an AbortHandle still requires cooperation from the task in order for it to actually stop. - My hunch was that by splitting deep call stacks into multiple chained shallow call stacks the cost of yielding to the runtime could be reduced. That was just a first principles reasoning thing that I wanted to try out. Still working on measuring if it has a performance benefit. #16301 is the variant of yielding where we would punt on the idea of implementing yielding via an optimizer rule or some other automatic mechanism. Instead you state that every `Stream` is responsible for its own behavior (which the guidelines kind of already do). And then we fix the base library implementations. I think this fixes cancellation for everyone except downstream users with custom pipeline breaking operators. For those people I would propose providing utilities in the library to make doing the right thing as low-effort as possible and provide best practice examples. Something akin to "If your code has looping patterns call 'consume_budget' every so often" or "wrap your input stream in a YieldStream". #16301 does not make use of Tokio's coop budget yet however. Instead it's the per-operator counter variant. The diff per operator does illustrate the scope and nature of the change `Stream` implementers would need to make. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org