2010YOUY01 commented on issue #15088:
URL: https://github.com/apache/datafusion/issues/15088#issuecomment-2713675854

   > Hi [@2010YOUY01](https://github.com/2010YOUY01),
   > 
   > I've taken a look into this and found that the issue seems to be more 
related to `SortExec` itself rather than the physical optimizer. Specifically:
   > 
   > * `SortExec::benefits_from_input_partitioning` returns `vec![false]`, 
causing `roundrobin_beneficial` to be `false`, which prevents the addition of a 
round-robin repartition plan.
   
   Thank you for the great insights, I think `SortExec` should benefit from 
input partitioning. Though it already has internal parallelism (by sorting 
small batches in parallel), however the final sort-preserving merge is not 
parallelizable, thus we need this extra repartitioning.
   
   > * `SortExec::required_input_distribution` returns 
`vec![Distribution::SinglePartition]` when `preserve_partitioning` is not set, 
meaning `ensure_distribution` also does not attempt to add round-robin 
repartitioning.
   > 
   > Would it make sense to adjust these two functions? Are there any cases 
that rely on the current settings? I'd love to hear your thoughts!
   
   This also makes sense to me, to make sure it's not conflicting with other 
things, we can make the change and check what will happen to the existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to