yjshen commented on issue #2079:
URL: 
https://github.com/apache/arrow-datafusion/issues/2079#issuecomment-1083283816


   > it is something I am currently mulling about and experimenting with. I 
agree that using async for CPU-bound work seems a little wonky, but as @alamb 
articulated 
[here](https://thenewstack.io/using-rustlangs-async-tokio-runtime-for-cpu-bound-tasks/)
 there are reasons that it may be the pragmatic choice. I'm trying to collect 
some data so we can make an informed decision 😅
   
   Very much looking forward to it.
   
   > you describe I think is closer to the more traditional plan-driven 
parallelism than morsel-driven parallelism. Tokio is much closer to that paper 
than what you describe as it incorporates notions of dynamic scheduling and 
work-stealing, rayon may be even closer
   
   I think work-stealing in Morsel-driven and that in Tokio are quite different 
things. Having a rough partition of the whole dataset at the beginning, and 
**stealing part of data** from the skewed partition to idle working slots or 
CPU cores later is quite different from **task/green thread stealing** for 
Tokio. Or do I miss something crucial that one SendableRecordBatchStream can be 
parallel processed by multiple tokio tasks? 🤔


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to