zhuliquan commented on code in PR #13315:
URL: https://github.com/apache/datafusion/pull/13315#discussion_r1842616030
##########
datafusion/expr/src/expr.rs:
##########
@@ -1674,6 +1674,69 @@ impl Expr {
}
}
+impl Normalizeable for Expr {
+ fn can_normalize(&self) -> bool {
+ #[allow(clippy::match_like_matches_macro)]
+ match self {
+ Expr::BinaryExpr(BinaryExpr {
+ op:
+ _op @ (Operator::Plus
+ | Operator::Multiply
+ | Operator::BitwiseAnd
+ | Operator::BitwiseOr
+ | Operator::BitwiseXor
+ | Operator::Eq
+ | Operator::NotEq),
+ ..
+ }) => true,
+ _ => false,
+ }
+ }
+}
+
+impl NormalizeEq for Expr {
+ fn normalize_eq(&self, other: &Self) -> bool {
+ match (self, other) {
+ (
+ Expr::BinaryExpr(BinaryExpr {
+ left: self_left,
+ op: self_op,
+ right: self_right,
+ }),
+ Expr::BinaryExpr(BinaryExpr {
+ left: other_left,
+ op: other_op,
+ right: other_right,
+ }),
+ ) => {
+ if self_op != other_op {
+ return false;
+ }
+
+ if matches!(
+ self_op,
+ Operator::Plus
+ | Operator::Multiply
+ | Operator::BitwiseAnd
+ | Operator::BitwiseOr
+ | Operator::BitwiseXor
+ | Operator::Eq
+ | Operator::NotEq
+ ) {
+ (self_left.normalize_eq(other_left)
+ && self_right.normalize_eq(other_right))
+ || (self_left.normalize_eq(other_right)
+ && self_right.normalize_eq(other_left))
+ } else {
+ self_left.normalize_eq(other_left)
+ && self_right.normalize_eq(other_right)
+ }
+ }
+ (_, _) => self == other,
Review Comment:
Hi @peter-toth, Apologies for the delayed commit. I've added more arm in the
`normalize_eq` function to handle cumulative `BinaryExpr` comparisons for other
expressions. While working on this, I also noticed that other expressions could
benefit from normalization. For example, with the `InList` and `CaseWhen`
expression, we can ignore the order of elements.
You can see the relevant code here:
https://github.com/apache/datafusion/blob/cc11692226da7e5dd49caaee2a8c3e66af920d4c/datafusion/expr/src/expr.rs#L2013
https://github.com/apache/datafusion/blob/cc11692226da7e5dd49caaee2a8c3e66af920d4c/datafusion/expr/src/expr.rs#L2034-L2036
In this case, I think the normalize_eq(&self, other: &Self) -> bool trait is
not the best way to handle this scenario for almost exponential time
complexity. At this moment, it's a good idea to normalize it first and then
compare it. Do you have any suggestions on how to approach this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]