2010YOUY01 commented on issue #15088: URL: https://github.com/apache/datafusion/issues/15088#issuecomment-2713675854
> Hi [@2010YOUY01](https://github.com/2010YOUY01), > > I've taken a look into this and found that the issue seems to be more related to `SortExec` itself rather than the physical optimizer. Specifically: > > * `SortExec::benefits_from_input_partitioning` returns `vec![false]`, causing `roundrobin_beneficial` to be `false`, which prevents the addition of a round-robin repartition plan. Thank you for the great insights, I think `SortExec` should benefit from input partitioning. Though it already has internal parallelism (by sorting small batches in parallel), however the final sort-preserving merge is not parallelizable, thus we need this extra repartitioning. > * `SortExec::required_input_distribution` returns `vec![Distribution::SinglePartition]` when `preserve_partitioning` is not set, meaning `ensure_distribution` also does not attempt to add round-robin repartitioning. > > Would it make sense to adjust these two functions? Are there any cases that rely on the current settings? I'd love to hear your thoughts! This also makes sense to me, to make sure it's not conflicting with other things, we can make the change and check what will happen to the existing tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
