alamb commented on code in PR #17442: URL: https://github.com/apache/datafusion/pull/17442#discussion_r2325732445
########## datafusion/physical-expr-common/src/sort_expr.rs: ########## @@ -414,27 +427,28 @@ impl LexOrdering { self.exprs.truncate(len); true } +} - /// Constructs a new `LexOrdering` from the given sort requirements w/o - /// enforcing non-degeneracy. This function is used internally and is not - /// meant (or safe) for external use. - fn construct(exprs: impl IntoIterator<Item = PhysicalSortExpr>) -> (bool, Self) { - let mut set = HashSet::new(); - let exprs = exprs - .into_iter() - .filter_map(|s| set.insert(Arc::clone(&s.expr)).then_some(s)) - .collect(); - (!set.is_empty(), Self { exprs, set }) +impl PartialEq for LexOrdering { + fn eq(&self, other: &Self) -> bool { + let Self { + exprs, + set: _, // derived from `exprs` Review Comment: 👍 ########## datafusion/physical-expr-common/src/sort_expr.rs: ########## @@ -367,8 +367,21 @@ impl LexOrdering { /// Creates a new [`LexOrdering`] from the given vector of sort expressions. /// If the vector is empty, returns `None`. pub fn new(exprs: impl IntoIterator<Item = PhysicalSortExpr>) -> Option<Self> { - let (non_empty, ordering) = Self::construct(exprs); - non_empty.then_some(ordering) + let exprs = exprs.into_iter(); Review Comment: I am not sure this is better / faster than what was previously used to comute exprs and populate the hashset: ```rust let exprs = exprs .into_iter() .filter_map(|s| set.insert(Arc::clone(&s.expr)).then_some(s)) .collect(); ``` Specifically, I think this will now potentially allocate more memory than required (it has space for all items, but we may filter out some) I personally suggest avoiding the Vec::with_capacity call unless we are sure it is better ########## datafusion/physical-expr-common/src/sort_expr.rs: ########## @@ -414,27 +427,28 @@ impl LexOrdering { self.exprs.truncate(len); true } +} - /// Constructs a new `LexOrdering` from the given sort requirements w/o - /// enforcing non-degeneracy. This function is used internally and is not - /// meant (or safe) for external use. - fn construct(exprs: impl IntoIterator<Item = PhysicalSortExpr>) -> (bool, Self) { - let mut set = HashSet::new(); - let exprs = exprs - .into_iter() - .filter_map(|s| set.insert(Arc::clone(&s.expr)).then_some(s)) - .collect(); - (!set.is_empty(), Self { exprs, set }) +impl PartialEq for LexOrdering { + fn eq(&self, other: &Self) -> bool { + let Self { + exprs, + set: _, // derived from `exprs` + } = self; + // PartialEq must be consistent with PartialOrd + exprs == &other.exprs } } - +impl Eq for LexOrdering {} impl PartialOrd for LexOrdering { /// There is a partial ordering among `LexOrdering` objects. For example, the /// ordering `[a ASC]` is coarser (less) than ordering `[a ASC, b ASC]`. /// If two orderings do not share a prefix, they are incomparable. fn partial_cmp(&self, other: &Self) -> Option<Ordering> { - self.iter() - .zip(other.iter()) + // PartialEq must be consistent with PartialOrd Review Comment: I think this results in the same speed as the previous code, as `iter()` returns `exprs.iter()` https://github.com/apache/datafusion/blob/cc43f5c107bf10daa88be2c58af3042f8aef9108/datafusion/physical-expr-common/src/sort_expr.rs#L494-L493 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org