alamb commented on code in PR #4885:
URL: https://github.com/apache/arrow-datafusion/pull/4885#discussion_r1070051389


##########
datafusion/core/src/physical_optimizer/repartition.rs:
##########
@@ -163,10 +178,34 @@ fn optimize_partitions(
             .children()
             .iter()
             .map(|child| {
+                // does plan itelf (not parent) require its input to
+                // be sorted in some way?
+                let required_input_ordering =
+                    plan_has_required_input_ordering(plan.as_ref());
+
+                let can_reorder_child = if can_reorder {
+                    // parent of `plan` will not use any particular order
+
+                    // if `plan` itself doesn't need order OR
+                    !required_input_ordering ||
+                    // child has no order to preserve
+                        child.output_ordering().is_none()
+                } else {
+                    // parent would like to use the `plan`'s output
+                    // order.
+
+                    // if `plan` doesn't maintain the input order and
+                    // doesn't need the child's output order itself
+                    (!plan.maintains_input_order() &&  
!required_input_ordering) ||
+                    // child has no ordering to preserve
+                        child.output_ordering().is_none()
+                };

Review Comment:
   This is the logic I eventually arrived at after a lot of thought. I tried to 
document the rationale as best I could -- perhaps @ozankabak  and @mustafasrepo 
 might have a few minutes to double check this. I believe this was first 
introduced by https://github.com/apache/arrow-datafusion/pull/4691



##########
datafusion/core/src/physical_optimizer/repartition.rs:
##########
@@ -201,6 +245,14 @@ fn optimize_partitions(
     }
 }
 
+/// Returns true if `plan` requires any of inputs to be sorted in some
+/// way for correctness. If this is true, its output should not be
+/// repartitioned if it would destroy the required order.
+fn plan_has_required_input_ordering(plan: &dyn ExecutionPlan) -> bool {
+    // NB: checking `is_empty()` is not the right check!

Review Comment:
   I lost easily an hour trying to figure this out



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to