Asif created SPARK-40362: ---------------------------- Summary: Bug in Canonicalization of expressions like Add & Multiply i.e Commutative Operators Key: SPARK-40362 URL: https://issues.apache.org/jira/browse/SPARK-40362 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Asif
In the canonicalization code which is now in two stages, canonicalization involving Commutative operators is broken, if they are subexpressions of certain type of expressions which override precanonicalize. Consider following expression: a + b > 10 This GreaterThan expression when canonicalized as a whole for first time, will skip the call to Canonicalize.reorderCommutativeOperators for the Add expression as the GreaterThan's canonicalization used precanonicalize on children ( the Add expression). so if create a new expression b + a > 10 and invoke canonicalize it, the canonicalized versions of these two expressions will not match. The test {quote} test("bug X") { val tr1 = LocalRelation('c.int, 'b.string, 'a.int) val y = tr1.where('a.attr + 'c.attr > 10).analyze val fullCond = y.asInstanceOf[Filter].condition.clone() val addExpr = (fullCond match { case GreaterThan(x: Add, _) => x case LessThan(_, x: Add) => x }).clone().asInstanceOf[Add] val canonicalizedFullCond = fullCond.canonicalized // swap the operands of add val newAddExpr = Add(addExpr.right, addExpr.left) // build a new condition which is same as the previous one, but with operands of //Add reversed val builtCondnCanonicalized = GreaterThan(newAddExpr, Literal(10)).canonicalized assertEquals(canonicalizedFullCond, builtCondnCanonicalized) } {quote} This test fails. The fix which I propose is that for the commutative expressions, the precanonicalize should be overridden and Canonicalize.reorderCommutativeOperators be invoked on the expression instead of at place of canonicalize. effectively for commutative operands ( add, or , multiply , and etc) canonicalize and precanonicalize should be same. I will file a PR for it -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org