1fanwang commented on PR #21870: URL: https://github.com/apache/datafusion/pull/21870#issuecomment-4393939725
> would it be worth having a separate canonicalization pre-processing step (called once), and have the simplification code rely on having the expressions already canonicalized (and making sure we keep them that way during simplification)? Sketched the pre-pass and ran into the wrinkle. Shape would be: a new optimizer rule `CanonicalizeExpressions` runs before `SimplifyExpressions`, walks every plan node, canonicalizes all `BinaryExpr`s. Then `SimplifyExpressions` drops its own canonicalize call (`simplify_exprs.rs:120`) and `expr_contains_inner` reverts to `==`. The block is `LogicalPlan::Join`. `Canonicalizer::f_up` at `expr_simplifier.rs:445-474` swaps `<col2> <op> <col1>` → `<col1> <swapped_op> <col2>` whenever `right > left`. For Join on-clauses where the right child's columns sort lexicographically before the left's, that inverts the LR side and breaks the invariant #8780 carved out. A pre-pass would need a Join-aware variant: canonicalize nested `BinaryExpr`s but skip the top-level conjunct/disjunct of `Join.on`. That relocates the #8780 exception to a new rule rather than removing it, and it's a larger surface than the one-line `normalize_eq` dispatch in this PR. The benchmarks above show no measurable cost for the dispatch — my read is the localized change is the lower-risk path for this fix. If a separate canonicalization pre-pass lands later (to enable other rules that benefit from canonical leaves), `expr_contains_inner` can swing back to `==` then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
