Dandandan edited a comment on pull request #546:
URL: https://github.com/apache/arrow-datafusion/pull/546#issuecomment-860194116


   > maybe a better way is to use rayon
   
   In my opinion we should try to stay away from Rayon and probably also should 
stay away from introducing parallelism at a low level (per expression) as it 
will likely add quite some overhead (in terms of extra threads, memory 
allocations hold on to and Rayon scheduling overhead).
   I added some more thoughts in this discussion
   
https://the-asf.slack.com/archives/C01QUFS30TD/p1623571458296000?thread_ts=1623523766.291500&cid=C01QUFS30TD
   
   Or we should have some convincing benchmarks / reasoning why at a certain 
part it is reasonable to introduce parallelism using tool x.
   
   I think for other parts of the code we could use Tokios  `spawn_blocking` 
function at the moment, while we design a better way to do parallelism.
   
   On the general level I think we mostly should check whether we perform 
parallelism at a partition / file level and introduce enough parallelism in the 
query plan (e.g. by using partitioning 
https://medium.com/@danilheres/increasing-the-level-of-parallelism-in-datafusion-4-0-d2a15b5a2093
 ).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to