ozankabak commented on code in PR #5035:
URL: https://github.com/apache/arrow-datafusion/pull/5035#discussion_r1090071827
##########
datafusion/core/src/physical_plan/union.rs:
##########
@@ -220,55 +220,55 @@ impl ExecutionPlan for UnionExec {
}
fn output_ordering(&self) -> Option<&[PhysicalSortExpr]> {
- let first_input_ordering = self.inputs[0].output_ordering();
- // If the Union is not partition aware and all the input ordering spec
strictly equal with the first_input_ordering
- // Return the first_input_ordering as the output_ordering
- //
- // It might be too strict here in the case that the input ordering are
compatible but not exactly the same.
- // For example one input ordering has the ordering spec
SortExpr('a','b','c') and the other has the ordering
- // spec SortExpr('a'), It is safe to derive the out ordering with the
spec SortExpr('a').
- if !self.partition_aware
- && first_input_ordering.is_some()
- && self
- .inputs
- .iter()
- .map(|plan| plan.output_ordering())
- .all(|ordering| {
- ordering.is_some()
- && sort_expr_list_eq_strict_order(
- ordering.unwrap(),
- first_input_ordering.unwrap(),
- )
- })
- {
- first_input_ordering
+ // If the Union is partition aware, there is no output ordering.
+ // Otherwise, the output ordering is the "meet" of its input orderings.
+ // The meet is the finest ordering that satisfied by all the input
+ // orderings, see https://en.wikipedia.org/wiki/Join_and_meet.
+ if self.partition_aware {
+ return None;
+ }
+ // To find the meet, we first find the smallest input ordering.
+ let mut smallest: Option<&[PhysicalSortExpr]> = None;
Review Comment:
Regarding 1, the initial implementation treats input orderings "atomically",
so it finds the meet in that sense. I think you are suggesting we find the meet
by "looking inside". So the initial implementation is overly conservative; i.e.
wrong in a relatively harmless way (it will lose fine ordering information but
not come up with a wrong ordering). Nevertheless, I agree that we should fix it.
Regarding 2, what would be your suggestion? Does it make sense to check
ordering without any equivalence treatment or normalization in this use case?
Happy to fix this quickly based on your feedback. Thanks for the detailed
review, and no worries about the late response.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]