tustvold opened a new issue, #13692: URL: https://github.com/apache/datafusion/issues/13692
### Is your feature request related to a problem or challenge? DataFusion currently performs CPU-bound work on the tokio runtime and this can lead to issues where IO tasks are starved and unable to adequately service the IO. Crucially this occurs long before the actual resources of the threadpool are exhausted, it is expected that if there is no CPU available that IO will be starved, what we want to avoid is IO getting throttled when there is resource available. The accepted wisdom has been to run DataFusion in a separate threadpool and spawn this work off into a separate threadpool, as described [here](https://www.influxdata.com/blog/using-rustlangs-async-tokio-runtime-for-cpu-bound-tasks/). https://github.com/apache/datafusion/issues/12393 seeks to document this, and by extension https://github.com/apache/datafusion/pull/13424 and https://github.com/apache/datafusion/pull/13690 to provide examples on how to achieve this. Unfortunately this approach comes with a lot of both explicit and implicit complexity, and makes it hard to integrate DataFusion into existing codebases making use of tokio. ### Describe the solution you'd like In addition to spawn_blocking, tokio also provides [block_in_place](https://docs.rs/tokio/latest/tokio/task/fn.block_in_place.html), the crucial difference being that block_in_place does not require a `'static` lifetime and can therefore borrow data from the invoking context. The basic idea would therefore be, **rather than move the IO off to a separate runtime away from the CPU bound tasks, instead wrap the CPU bound tasks so that they can't starve the runtime**. In particular the calls to perform potentially blocking work would be wrapped in calls to block_in_place. Ultimately DF has three broad class of task: 1. Fully synchronous code - e.g. optimisation, function evaluation, planning, most physical operators 2. Mixed IO and CPU-bound - e.g. file IO, schema inference, etc... 3. Pure IO - catalog The vast majority of code in DF falls into the fully synchronous bucket (1.), and so it is a natural response to want to special case the less common IO (2. / 3.) instead of the more common CPU-bound code. However, I'd like to posit that conceptually and practically **it is much simpler to dispatch the synchronous code than the asynchronous code**, especially when one considers the ergonomics afforded by block_in_place, and that the items falling into the second bucket are the most complex operators in DataFusion. That being said this does come with some risks: * It might become hard to determine what is sufficiently CPU-bound to warrant spawning * There would potentially be a lot of callsites that would need wrapping * It would be important to use block_in_place only at the granularity of a data batch, to ensure overheads are amortized ### Describe alternatives you've considered We could instead move IO off to a separate runtime ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org