Dandandan commented on issue #1359:
URL: 
https://github.com/apache/datafusion-ballista/issues/1359#issuecomment-3742506155

   Something needs to track the count of rows being read in the partitions / 
tasks and early 
   
   > > Something that might be useful to include, that I think Spark-like AQE 
does not do, but is a super useful optimization is early stopping query 
execution based on _global_ limit during a scan (e.g. `select * where x limit 
100`).
   > > This is not possible with just reoptimizing after a full stage or even a 
task has been finished (at that point other running tasks might have also 
already scanned way more than necessary). This can avoid using a lot of 
resources / improve query latency _a lot_ for highly selective queries.
   > 
   > how can we stop stage in this case ?
   
   Something needs to track the sum of count of rows being written during scan, 
and cancel all the tasks/partitions (from schedulur) once the sum exceeds the 
limit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to