1fanwang commented on PR #21870:
URL: https://github.com/apache/datafusion/pull/21870#issuecomment-4393939725

   > would it be worth having a separate canonicalization pre-processing step 
(called once), and have the simplification code rely on having the expressions 
already canonicalized (and making sure we keep them that way during 
simplification)?
   
   Sketched the pre-pass and ran into the wrinkle.
   
   Shape would be: a new optimizer rule `CanonicalizeExpressions` runs before 
`SimplifyExpressions`, walks every plan node, canonicalizes all `BinaryExpr`s. 
Then `SimplifyExpressions` drops its own canonicalize call 
(`simplify_exprs.rs:120`) and `expr_contains_inner` reverts to `==`.
   
   The block is `LogicalPlan::Join`. `Canonicalizer::f_up` at 
`expr_simplifier.rs:445-474` swaps `<col2> <op> <col1>` → `<col1> <swapped_op> 
<col2>` whenever `right > left`. For Join on-clauses where the right child's 
columns sort lexicographically before the left's, that inverts the LR side and 
breaks the invariant #8780 carved out.
   
   A pre-pass would need a Join-aware variant: canonicalize nested 
`BinaryExpr`s but skip the top-level conjunct/disjunct of `Join.on`. That 
relocates the #8780 exception to a new rule rather than removing it, and it's a 
larger surface than the one-line `normalize_eq` dispatch in this PR.
   
   The benchmarks above show no measurable cost for the dispatch — my read is 
the localized change is the lower-risk path for this fix. If a separate 
canonicalization pre-pass lands later (to enable other rules that benefit from 
canonical leaves), `expr_contains_inner` can swing back to `==` then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to