crepererum commented on issue #13692:
URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2527658992

   Couple of thoughts:
   
   - **runtime requirements:** `block_in_place` requires the multi-thread 
runtime. That's not a blocker, but we should clearly communicate that this 
means that DF can no longer run under the "same thread" runtime.
   - **performance:** Some experiments that I did suggested that 
`block_in_place` scales significantly worse that two runtimes (one for CPU 
work, one for IO). However this may not be the general case. I just had the 
impression that `block_in_place` is not particularity well optimized and in 
fact it requires the tokio scheduler to do some rather nasty thread switching.
   
   I'm wondering if we should rather write our own async scheduler for the 
async compute graph that can better deal with the DF workload. For IO, we still 
can (and should) use tokio, but I question if shoehorning tokio into the 
CPU-bound workload is really worth it. And FWIW: such a scheduler can -- at 
least to some extend -- be pull OR push-based (requires some wiring, but not 
impossible).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to