alamb opened a new issue, #19770: URL: https://github.com/apache/datafusion/issues/19770
### Is your feature request related to a problem or challenge? At @akurmustafa 's great suggestion on https://github.com/apache/datafusion/discussions/18260 I submitted, and was accepted to speak at the first TokioConf about using Tokio as the DataFusion CPU runtime ngine Here is more detail https://www.tokioconf.com/speakers Here is the talk summary > ### Using Tokio for CPU-Bound Tasks (Works Really Well) The Tokio runtime at the heart of the Rust async ecosystem is also a good choice for CPU-heavy jobs such as those found in analytics engines. We will review what makes Tokio a compelling choice for CPU bound workloads, address common concerns, and report on our experience using Tokio as the thread scheduler for Apache DataFusion ### Describe the solution you'd like I want to create this talk / slides in the open. If the talks aren't recorded, I will also record a second version of the talk ### Describe alternatives you've considered The high level idea will be to summarize the findings in https://www.influxdata.com/blog/using-rustlangs-async-tokio-runtime-for-cpu-bound-tasks/ And then refresh the major pitfalls Talk outline: 1. Analytic DB 101 + Volcano Model: Explain DataFusion execution model (data flow graphs, and vectorzed execution) 1. Explain why people thought using tokio for CPU was bad and the counter arguments 2. Demonstrate how tokio's scheduler effectively implements the "get_next_batch()" API on the same thread 3. Discuss pitfalls Major pitfall 1: Using the same async runtime for IO and CPU bound tasks * Explain symptoms (everything just slows down under high concurency) -- the theore is that this is due to the network protocol congestion control protocol ( * Explain solution: use separate runtimes, thread it throgh * TODO: find DF example of multiple runtimes * TODO: mention the challenge of having to pass a new runtime to different IO libraries (object_store, etc) Major pitfall 2: Hot loops and cancelling * Basically summarize the contents of https://datafusion.apache.org/blog/2025/06/30/cancellation/ from @pepijnve * Explain symtpoms: Cancelling and the plan keeps going * Solution 1: (obvious one) no hot loops * Solution 2: (less obvious) need to make sure we periodically yield back to the scheduler (otherwise tasks keep running but the scheduer never gets a chance to figure out the consumers have been dropped) ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
