alamb commented on code in PR #8780:
URL: https://github.com/apache/arrow-datafusion/pull/8780#discussion_r1446604027
##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -1594,6 +1641,20 @@ mod tests {
// --- Simplifier tests -----
// ------------------------------
+ #[test]
+ fn test_simplify_canonicalize() {
+ {
+ let expr = lit(1).lt(col("c2")).and(col("c2").gt(lit(1)));
+ let expected = col("c2").gt(lit(1));
+ assert_eq!(simplify(expr), expected);
+ }
+ {
+ let expr = col("c1").lt(col("c2")).and(col("c2").gt(col("c1")));
+ let expected = col("c2").gt(col("c1"));
+ assert_eq!(simplify(expr), expected);
+ }
+ }
Review Comment:
Could you also please add some additional slightly more complex tests that
also use equality and constants (an important use case) such as the ones listed
on
https://github.com/apache/arrow-datafusion/issues/8724#issuecomment-1877868335 ?
```
A=1 AND 1 = A AND A = 3 --> A = 1 AND A = 3
A=B AND A > 5 AND B=A --> A=B AND A > 5
(A=1 AND (B> 3 OR 3 < B)) --> (A = 1 AND B > 3)
```
Also, it would be great to add some negative tests as well like
```
A < 5 AND A >= 5 --> same (I realize this expression can never be true, but
I don't think the simplifier knows how to handle that yet
A > B AND A > B --> same
```
##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -224,6 +226,51 @@ impl<S: SimplifyInfo> ExprSimplifier<S> {
}
}
+#[allow(rustdoc::private_intra_doc_links)]
+/// Canonicalize any BinaryExprs that are not in canonical form
+/// <literal> <op> <col> is rewritten to <col> <op> <literal>
+/// <col1> <op> <col2> is rewritten so that the name of col1 sorts higher than
col2 (b > a would be canonicalized to a < b)
Review Comment:
I think you can avoid having to add this `rustdoc` lint by using some more
markdown syntax like this
```suggestion
/// Canonicalize any BinaryExprs that are not in canonical form
///
/// `<literal> <op> <col>` is rewritten to `<col> <op> <literal>`
///
/// `<col1> <op> <col2>` is rewritten so that the name of `col1` sorts higher
/// than `col2` (`b > a` would be canonicalized to `a < b`)
```
##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -224,6 +226,51 @@ impl<S: SimplifyInfo> ExprSimplifier<S> {
}
}
+#[allow(rustdoc::private_intra_doc_links)]
+/// Canonicalize any BinaryExprs that are not in canonical form
+/// <literal> <op> <col> is rewritten to <col> <op> <literal>
+/// <col1> <op> <col2> is rewritten so that the name of col1 sorts higher than
col2 (b > a would be canonicalized to a < b)
+struct Canonicalizer {}
+
+impl Canonicalizer {
+ fn new() -> Self {
+ Self {}
+ }
+}
+
+impl TreeNodeRewriter for Canonicalizer {
+ type N = Expr;
+
+ fn mutate(&mut self, expr: Expr) -> Result<Expr> {
+ if let Expr::BinaryExpr(BinaryExpr { left, op, right }) = expr {
+ match (left.as_ref(), right.as_ref(), op.swap()) {
Review Comment:
You can do something like this to reduce the indent of this function if you
want (not required, I just figured I would point it out given you say you are
looking to improve your rust skills
```suggestion
let Expr::BinaryExpr(BinaryExpr { left, op, right }) = expr else {
return Ok(expr)
};
match (left.as_ref(), right.as_ref(), op.swap()) {
```
##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -224,6 +226,51 @@ impl<S: SimplifyInfo> ExprSimplifier<S> {
}
}
+#[allow(rustdoc::private_intra_doc_links)]
+/// Canonicalize any BinaryExprs that are not in canonical form
+/// <literal> <op> <col> is rewritten to <col> <op> <literal>
+/// <col1> <op> <col2> is rewritten so that the name of col1 sorts higher than
col2 (b > a would be canonicalized to a < b)
+struct Canonicalizer {}
+
+impl Canonicalizer {
+ fn new() -> Self {
+ Self {}
+ }
+}
+
+impl TreeNodeRewriter for Canonicalizer {
+ type N = Expr;
+
+ fn mutate(&mut self, expr: Expr) -> Result<Expr> {
+ if let Expr::BinaryExpr(BinaryExpr { left, op, right }) = expr {
+ match (left.as_ref(), right.as_ref(), op.swap()) {
+ // <col1> <op> <col2>
+ (Expr::Column(_), Expr::Column(_), Some(swapped_op)) => {
+ if right.canonical_name() > left.canonical_name() {
Review Comment:
Since Column implements `PartialOrd` you don't have to use` canonical_name`
here (which allocates a string)
```suggestion
(Expr::Column(left_col), Expr::Column(right_col),
Some(swapped_op)) => {
if left_col > right_col {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]