alamb commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-4388022999
> The basic idea would be, rather than move the IO off to a separate runtime away from the CPU bound tasks, instead wrap the CPU bound tasks so that they can't starve the runtime. In my opinion, this is best practice and it can be done today -- see the [thread_pools.rs](https://github.com/apache/datafusion/blob/cd05b417544262f8a6c114e304055da53e5b4162/datafusion-examples/examples/query_planning/thread_pools.rs) example There are two challenges I know of with the "multi-threadpool" (Multi-Runtime) approach: 1. It is easy to get wrong (it is easy to run IO/CPU on the same Runtime; In fact we did this at Influx even when we knew (much) better) 2. To fully use CPUs you need to "over commit" CPUs (e.g. more threads than cores, and let the OS prioritize), which can lead to additional overhead due to context switching. You can read more about this usecase in the tokio issue I filed recently here: https://github.com/tokio-rs/tokio/issues/8085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
