Dandandan commented on issue #1359: URL: https://github.com/apache/datafusion-ballista/issues/1359#issuecomment-3742506155
Something needs to track the count of rows being read in the partitions / tasks and early > > Something that might be useful to include, that I think Spark-like AQE does not do, but is a super useful optimization is early stopping query execution based on _global_ limit during a scan (e.g. `select * where x limit 100`). > > This is not possible with just reoptimizing after a full stage or even a task has been finished (at that point other running tasks might have also already scanned way more than necessary). This can avoid using a lot of resources / improve query latency _a lot_ for highly selective queries. > > how can we stop stage in this case ? Something needs to track the sum of count of rows being written during scan, and cancel all the tasks/partitions (from schedulur) once the sum exceeds the limit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
