mustafasrepo commented on code in PR #7488:
URL: https://github.com/apache/arrow-datafusion/pull/7488#discussion_r1319488427


##########
datafusion/core/src/physical_plan/repartition/mod.rs:
##########
@@ -637,7 +637,8 @@ impl RepartitionExec {
 
     /// Set Order preserving flag
     pub fn with_preserve_order(mut self, preserve_order: bool) -> Self {
-        self.preserve_order = preserve_order;
+        // Set "preserve order" mode only if the operator cannot maintain the 
order:
+        self.preserve_order = preserve_order && 
!self.maintains_input_order()[0];

Review Comment:
   The reason for this change is that, `RepartitionExec` and 
`SortPreservingRepartitionExec` has different approaches. 
`SortPreservingRepartitionExec` adds additional overhead compared to  
`RepartitionExec`. For this reason wo not want to use 
`SortPreservingRepartitionExec` unless it is necessary. (`preserve_order` flag 
determines working mode, if `true` mode is `SortPreservingRepartitionExec` is 
used otherwise  `RepartitionExec` is used. However, when input partition number 
is 1, `RepartitionExec` can also maintain ordering. Hence for this case, we do 
not need to use `SortPreservingRepartitionExec`). I agree that your proposed 
version is more readable. I changed the implementation according to your 
suggestion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to