yjshen commented on issue #2079: URL: https://github.com/apache/arrow-datafusion/issues/2079#issuecomment-1083283816
> it is something I am currently mulling about and experimenting with. I agree that using async for CPU-bound work seems a little wonky, but as @alamb articulated [here](https://thenewstack.io/using-rustlangs-async-tokio-runtime-for-cpu-bound-tasks/) there are reasons that it may be the pragmatic choice. I'm trying to collect some data so we can make an informed decision 😅 Very much looking forward to it. > you describe I think is closer to the more traditional plan-driven parallelism than morsel-driven parallelism. Tokio is much closer to that paper than what you describe as it incorporates notions of dynamic scheduling and work-stealing, rayon may be even closer I think work-stealing in Morsel-driven and that in Tokio are quite different things. Having a rough partition of the whole dataset at the beginning, and **stealing part of data** from the skewed partition to idle working slots or CPU cores later is quite different from **task/green thread stealing** for Tokio. Or do I miss something crucial that one SendableRecordBatchStream can be parallel processed by multiple tokio tasks? 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
