Dandandan commented on PR #15380:
URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2804526858

   > > I think for `ExternalSorter` we don't want any additional parallelism as 
the sort is already executed per partition (so additional parallelism is likely 
to hurt rather than help).
   > 
   > In this case, the final merging might become the bottleneck, because SPM 
does not have internal parallelism either, during the final merge only 1 core 
is busy. I think 2 stages of sort-preserving merge is still needed, becuase 
`ExternalSorter` is blocking, but `SPM` is not, this setup can keep all the 
cores busy after partial sort is finished. We just have to ensure they don't 
have a very large merge degree to become slow (with the optimizations like this 
PR)
   
   Yes, to be clear I don't argue to remove SortPreservingMergeExec or sorting 
in two fases altogether or something similar, just was reacting to the idea of 
adding more parallelism in `in_mem_sort_stream` which probably won't help much.
   
   ```
   SortPreserveMergeExec <= Does k-way merging based on input streams, with 
minimal memory overhead, maximizing input parallelism
        SortExec partitions[1,2,3,4,5,6,7,8,9,10] <= Performs in memory 
*sorting* if possible, for each input partition in parallel, only resorting to 
spill/merge when does not fit into memory 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to