zhuqi-lucas commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2915766242
> Yes this is more or less the same issue. PR [#14028](https://github.com/apache/datafusion/pull/14028) proposed adding a yield point at the leaf of the plan when moving from one file to the next. This PR adds yield points closer to the top of the plan tree just below the AggregateExec's stream by wrapping its input and then yields every 64 input batches. I was wondering if that should be row count or time interval based rather than batch count based. > > I found [#15314](https://github.com/apache/datafusion/issues/15314) in the meantime as well. This issue provides one concrete and easily reproducible example of a query that cannot be cancelled. > > ~The comments on PR [#14028](https://github.com/apache/datafusion/pull/14028) regarding Tokio's `yield_now` are interesting and relevant for PR [#16196](https://github.com/apache/datafusion/pull/16196) as well. Seems like the code pattern should be~ ~I can run some tests to see what the actual behavior is in the ST and MT Tokio runtimes if that helps.~ > > Edit: conclusion in PR [#14028](https://github.com/apache/datafusion/pull/14028) discussion was that calling `wake_by_ref` was fine. Thank you @pepijnve for review, why i was not using row count because we need to calculate batch_size * batch count, we want to not affect the performance for core logic for datafusion, even batch count 64, i am wandering if it will affect the core logic performance when we have huge data. I am wandering if we can wrapper the yeild logic outside the core exec logic in datafusion, such as in the datafusion-cli side if we only want to do the ctril c in datafusion-cli side. But it seems more cases besides datafusion-cli which want to terminate the streaming, for example the customers who use grpc to terminate: https://github.com/apache/datafusion/issues/14036#issuecomment-2577862225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org