zhuqi-lucas commented on issue #16193:
URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2915766242

   > Yes this is more or less the same issue. PR 
[#14028](https://github.com/apache/datafusion/pull/14028) proposed adding a 
yield point at the leaf of the plan when moving from one file to the next. This 
PR adds yield points closer to the top of the plan tree just below the 
AggregateExec's stream by wrapping its input and then yields every 64 input 
batches. I was wondering if that should be row count or time interval based 
rather than batch count based.
   > 
   > I found [#15314](https://github.com/apache/datafusion/issues/15314) in the 
meantime as well. This issue provides one concrete and easily reproducible 
example of a query that cannot be cancelled.
   > 
   > ~The comments on PR 
[#14028](https://github.com/apache/datafusion/pull/14028) regarding Tokio's 
`yield_now` are interesting and relevant for PR 
[#16196](https://github.com/apache/datafusion/pull/16196) as well. Seems like 
the code pattern should be~ ~I can run some tests to see what the actual 
behavior is in the ST and MT Tokio runtimes if that helps.~
   > 
   > Edit: conclusion in PR 
[#14028](https://github.com/apache/datafusion/pull/14028) discussion was that 
calling `wake_by_ref` was fine.
   
   Thank you @pepijnve for review, why i was not using row count because we 
need to calculate batch_size * batch count, we want to not affect the 
performance for core logic for datafusion, even batch count 64, i am wandering 
if it will affect the core logic performance when we have huge data. 
   
   I am wandering if we can wrapper the yeild logic outside the core exec logic 
in datafusion, such as in the datafusion-cli side if we only want to do the 
ctril c in datafusion-cli side. 
   
   But it seems more cases besides datafusion-cli which want to terminate the 
streaming, for example the customers who use grpc to terminate:
   
   https://github.com/apache/datafusion/issues/14036#issuecomment-2577862225 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to