alamb commented on issue #8724: URL: https://github.com/apache/arrow-datafusion/issues/8724#issuecomment-1877868335
I am marking this as a good first issue but it is really a medium sized project However, I think it is well specified and the existing code is straightforward to extend The goal is to add this simplification directly to [ExprSimplifier](https://docs.rs/datafusion/latest/datafusion/optimizer/simplify_expressions/struct.ExprSimplifier.html#) ## Canonicalize First canonicalize any BinaryExprs so: 1. `<literal> <op> <col>` is rewritten to `<col> <op> <literal>` (remember to switch the operator) 2. `<col1> <op> <col2>` is rewritten so that the name of col1 sorts higher than col2 (`b > a` would be canonicalized to `a < b`); ## Remove reundancy 1. For any chain of `<expr1> AND <expr2> AND <expr3>` remove any identical `expr`s 2. For any chain of `<expr1> OR <expr2> OR <expr3>` remove any identical `expr`s So for example I would expect the following to be simplified: ``` A=1 AND 1 = A AND A = 3 --> A = 1 AND A = 3 ``` ``` (A=1 AND (B> 3 OR 3 < B)) --> (A = 1 AND B > 3) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
