Dandandan edited a comment on pull request #546: URL: https://github.com/apache/arrow-datafusion/pull/546#issuecomment-860194116
> maybe a better way is to use rayon In my opinion we should try to stay away from Rayon and probably also should stay away from introducing parallelism at a low level (per expression) as it will likely add quite some overhead (in terms of extra threads, memory allocations hold on to and Rayon scheduling overhead). I added some more thoughts in this discussion https://the-asf.slack.com/archives/C01QUFS30TD/p1623571458296000?thread_ts=1623523766.291500&cid=C01QUFS30TD Or we should have some convincing benchmarks / reasoning why at a certain part it is reasonable to introduce parallelism using tool x. I think for other parts of the code we could use Tokios `spawn_blocking` function at the moment, while we design a better way to do parallelism. On the general level I think we mostly should check whether we perform parallelism at a partition / file level and introduce enough parallelism in the query plan (e.g. by using partitioning https://medium.com/@danilheres/increasing-the-level-of-parallelism-in-datafusion-4-0-d2a15b5a2093 ). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org