mustafasrepo commented on PR #5171:
URL: 
https://github.com/apache/arrow-datafusion/pull/5171#issuecomment-1420617520

   > > > @alamb @ozankabak @mustafasrepo
   > > > Regarding the global sort replaced to a parallel 
version(SortPreservingMergeExec + Local Sort) optimization, I think there is 
already a rule `GlobalSortSelection` for the exact purpose. I think we should 
not let the Sort Enforcement rule to handle this again. Implement/enhance such 
optimization in the `GlobalSortSelection` rule is more straightforward and do 
not need to care the positions of the `CoalescePartitionsExec`.
   > > 
   > > 
   > > I am not sure how we can do all the local sort + merge substitutions 
just with `GlobalSortSelection`, which doesn't track coalesce operations on 
partitions as you rightly point out. Note that we handle (and 
parallel-optimize) not just top level sorts, but sorts at any depth within the 
plan, even with intermediate executors in between the coalesce operation and 
the sort in question.
   > > We will take a deeper look today and see if we can move over the logic 
to `GlobalSortSelection` while still preserving the same functionality. If we 
can, great -- if not, we will share an example that blocks this. Thank you for 
the suggestion 👍
   > 
   > Yes, please take a look at the `GlobalSortSelection` rule. This rule does 
not need to care about the position of `CoalescePartitionsExec` because 
`CoalescePartitionsExec`s are added by `EnforceDistribution` rule which is 
triggered after the `GlobalSortSelection` rule. The physical Sort Selection 
should happen in a very early stage of the physical optimization phase. I guess 
why the current `GlobalSortSelection` does not optimize all the `Global Sort` 
is because it is not that aggressive and has an additional check. If you 
comment that check, all the `Global Sort` should be replaced.
   > 
   > `&& sort_exec.fetch().is_some()`
   
   @mingmwang your suggestion works. This greatly simplifies the code. Thanks 
for the suggestion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to