asolimando commented on code in PR #4557:
URL: https://github.com/apache/calcite/pull/4557#discussion_r2387220908


##########
core/src/test/resources/sql/planner.iq:
##########
@@ -223,15 +224,16 @@ select a from (values (1.0), (4.0), (null)) as t3 (a);
 
 !ok
 
-EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], A=[$t1])
-  EnumerableNestedLoopJoin(condition=[OR(AND(IS NULL(CAST($0):DECIMAL(11, 1)), 
IS NULL(CAST($1):DECIMAL(11, 1))), =(CAST($0):DECIMAL(11, 1), 
CAST($1):DECIMAL(11, 1)))], joinType=[anti])
-    EnumerableAggregate(group=[{0}])
-      EnumerableNestedLoopJoin(condition=[=(CAST($0):DECIMAL(11, 1) NOT NULL, 
CAST($1):DECIMAL(11, 1) NOT NULL)], joinType=[anti])
-        EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) NOT 
NULL], A=[$t1])
-          EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0 
}]])
-        EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) NOT 
NULL], A=[$t1])
-          EnumerableValues(tuples=[[{ 1 }, { 2 }]])
-    EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], 
A=[$t1])
+EnumerableCalc(expr#0..1=[{inputs}], expr#2=[CAST($t0):DECIMAL(11, 1)], 
A=[$t2])
+  EnumerableHashJoin(condition=[=($1, $3)], joinType=[anti])
+    EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], 
proj#0..1=[{exprs}])
+      EnumerableAggregate(group=[{0}])
+        EnumerableNestedLoopJoin(condition=[=(CAST($0):DECIMAL(11, 1) NOT 
NULL, CAST($1):DECIMAL(11, 1) NOT NULL)], joinType=[anti])
+          EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) 
NOT NULL], A=[$t1])
+            EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 
5.0 }]])
+          EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) 
NOT NULL], A=[$t1])
+            EnumerableValues(tuples=[[{ 1 }, { 2 }]])
+    EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], 
A=[$t1], A0=[$t1])

Review Comment:
   In engineering, the root cause is the earliest, fundamental condition in the 
cause-effect chain such that, if you change it, the symptom won’t recur. It’s 
the source behavior that keeps producing the symptom whenever downstream 
conditions allow it to surface.
   
   We agree on this definition and on how `A0` (and other redundant 
expressions) end up in the final plan. Since you asked specifically "Why is 
there an extra A0 here?", my answer focused on this instance:
   
   By the above definition, the root cause is the current implementation of 
`IntersectToSemiJoinRule`, it can't be the limitation of simplification, 
because that alone isn't enough to reproduce the issue, that's what I mean, I 
am not saying that you can't fix it by improving it nor that I don't 
acknowledge the limitation.
   
   We both agree that the improved DECIMAL lossless-cast simplification turns 
the predicate into a pure equi-join, enabling `JoinPushExpressionsRule` to push 
the casts down. And once those are referenced by the join, nothing removes them 
(as you have correctly stated, and I again agree).
   
   Possible fixes I have considered: 1) remove the root cause (CALCITE-7203: 
precompute keys to leastRowType), or 2) enhance `JoinPushExpressionsRule` to 
detect and reuse equivalent expressions. I went with 1) as it was simpler and 
that's what we typically do, we both simplify at the source and rely on 
existing simplification/pruning, over plugging redundancy checks into every 
rule that introduces predicates.
   
   For this concrete case, we agree CALCITE-7203 is the most efficient, 
localized fix.
   
   More broadly, while we strive to eliminate redundancy, fully complete 
simplification isn’t practical: detecting and removing all equivalent 
expressions can require exponential expansion in worst cases. That’s why we 
prefer targeted source fixes plus general simplification, rather than pushing 
heavy redundancy detection into each rule.
   
   This said, if we can improve simplification we should always strive to do 
that, I totally agree.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to