thinkharderdev opened a new issue, #11451: URL: https://github.com/apache/datafusion/issues/11451
### Is your feature request related to a problem or challenge? Certain operators (eg `AggregateExec`, `SortExec`) are "greedy" in that they will continue processing batches as long as they are produced from their input without yielding. This can can effectively create a hot loop which can monopolize worker threads and starve other tasks in the runtime. ### Describe the solution you'd like Add an optional `coop_budget` to the `ExecutionOptions`. When set, greedy operators would wrap their base record batch stream in a stream which ensures it yields back to the scheduler after every `coop_budget` record batches. This would only kick in after `coop_budget` batches are processed without yielding. If the underlying stream yields, the coop budget gets reset. ### Describe alternatives you've considered This can be done outside of DataFusion by inserting the cooperative stream at various places but it would be nice if this were built-in to the engine ### Additional context I think this is probably not a big issue if you are setting the partition parallelism to the number of CPU cores since the IO is fairly well pipelined inside `ParquetExec` and other operators which are doing IO, but we have found that in network-IO-heavy workloads (eg reading from object storage) scheduling one partition per core leaves the executors underutilized in most cases. The goal of this feature would be to be able to oversubscribe the cores to effectively take advantage of IO parallelism while avoiding horrendous tail latencies in particularly CPU-intensive queries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org