crepererum commented on issue #13692: URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2527658992
Couple of thoughts: - **runtime requirements:** `block_in_place` requires the multi-thread runtime. That's not a blocker, but we should clearly communicate that this means that DF can no longer run under the "same thread" runtime. - **performance:** Some experiments that I did suggested that `block_in_place` scales significantly worse that two runtimes (one for CPU work, one for IO). However this may not be the general case. I just had the impression that `block_in_place` is not particularity well optimized and in fact it requires the tokio scheduler to do some rather nasty thread switching. I'm wondering if we should rather write our own async scheduler for the async compute graph that can better deal with the DF workload. For IO, we still can (and should) use tokio, but I question if shoehorning tokio into the CPU-bound workload is really worth it. And FWIW: such a scheduler can -- at least to some extend -- be pull OR push-based (requires some wiring, but not impossible). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org