asolimando commented on code in PR #4557:
URL: https://github.com/apache/calcite/pull/4557#discussion_r2384022473


##########
core/src/test/resources/sql/planner.iq:
##########
@@ -223,15 +224,16 @@ select a from (values (1.0), (4.0), (null)) as t3 (a);
 
 !ok
 
-EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], A=[$t1])
-  EnumerableNestedLoopJoin(condition=[OR(AND(IS NULL(CAST($0):DECIMAL(11, 1)), 
IS NULL(CAST($1):DECIMAL(11, 1))), =(CAST($0):DECIMAL(11, 1), 
CAST($1):DECIMAL(11, 1)))], joinType=[anti])
-    EnumerableAggregate(group=[{0}])
-      EnumerableNestedLoopJoin(condition=[=(CAST($0):DECIMAL(11, 1) NOT NULL, 
CAST($1):DECIMAL(11, 1) NOT NULL)], joinType=[anti])
-        EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) NOT 
NULL], A=[$t1])
-          EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0 
}]])
-        EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) NOT 
NULL], A=[$t1])
-          EnumerableValues(tuples=[[{ 1 }, { 2 }]])
-    EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], 
A=[$t1])
+EnumerableCalc(expr#0..1=[{inputs}], expr#2=[CAST($t0):DECIMAL(11, 1)], 
A=[$t2])
+  EnumerableHashJoin(condition=[=($1, $3)], joinType=[anti])
+    EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], 
proj#0..1=[{exprs}])
+      EnumerableAggregate(group=[{0}])
+        EnumerableNestedLoopJoin(condition=[=(CAST($0):DECIMAL(11, 1) NOT 
NULL, CAST($1):DECIMAL(11, 1) NOT NULL)], joinType=[anti])
+          EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) 
NOT NULL], A=[$t1])
+            EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 
5.0 }]])
+          EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) 
NOT NULL], A=[$t1])
+            EnumerableValues(tuples=[[{ 1 }, { 2 }]])
+    EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], 
A=[$t1], A0=[$t1])

Review Comment:
   I took a deeper look, the root cause is 
[IntersectToSemiJoinRule.java#L119-L128](https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/main/java/org/apache/calcite/rel/rules/IntersectToSemiJoinRule.java#L119-L128),
 which repeatedly creates cast expressions between pair of intersect operands 
repeatedly, while we could "pre-compute" these join keys targeting the 
`leastRowType`, which is the final type that all intersect operands must 
conform to, instead of pair-wise between intersect operands, which introduces 
duplicates and noise due to the partial type unification vs the stable type 
unification against `leastRowType`.
   
   All these cast expressions are potentially pushed down by 
`JoinPushExpressionsRule` as follows:
   ```
   LogicalJoin(condition=[=(CAST($0):DECIMAL(11, 1) NOT NULL, 
CAST($1):DECIMAL(11, 1) NOT NULL)], joinType=[semi])
     LogicalProject(subset=[rel#37:RelSubset#1.NONE.[0]], 
A=[CAST($0):DECIMAL(11, 1) NOT NULL])
       LogicalValues(subset=[rel#35:RelSubset#0.NONE.[0]], tuples=[[{ 1.0 }, { 
2.0 }, { 3.0 }, { 4.0 }, { 5.0 }]])
   ```
   becomes:
   ```
     LogicalJoin(condition=[=($1, $3)], joinType=[semi])
       LogicalProject(A=[$0], A0=[CAST($0):DECIMAL(11, 1) NOT NULL])
         LogicalProject(subset=[rel#37:RelSubset#1.NONE.[0]], 
A=[CAST($0):DECIMAL(11, 1) NOT NULL])
           LogicalValues(subset=[rel#35:RelSubset#0.NONE.[0]], tuples=[[{ 1.0 
}, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0 }]])
   ```
   
   This wasn't happening before due to the more complex join condition, which 
included an `OR` with `IS NULL` (`LogicalJoin(condition=[OR(AND(IS 
NULL(CAST($0):DECIMAL(11, 1)), IS NULL(CAST($1):DECIMAL(11, 1))), 
=(CAST($0):DECIMAL(11, 1), CAST($1):DECIMAL(11, 1)))], joinType=[semi])` vs 
`LogicalJoin(condition=[=(CAST($0):DECIMAL(11, 1), CAST($1):DECIMAL(11, 1))], 
joinType=[semi])` now due to simplifications allowed by the improved lossless 
cast detection). 
   
   It suffices to compare the logs to find `JoinPushExpressionsRule` being 
applied in the new code while it wasn't the case in the old code:
   
   New code:
   ```
   JoinPushExpressionsRule:
   DEBUG - Transform to: rel#106 via JoinPushExpressionsRule
   DEBUG - call#422: Full plan for rule input [rel#75:LogicalJoin]: 
   LogicalJoin(condition=[=(CAST($0):DECIMAL(11, 1), CAST($1):DECIMAL(11, 1))], 
joinType=[semi])
     LogicalIntersect(subset=[rel#42:RelSubset#4.NONE.[]], all=[false])
       LogicalProject(subset=[rel#37:RelSubset#1.NONE.[0]], 
A=[CAST($0):DECIMAL(11, 1) NOT NULL])
         LogicalValues(subset=[rel#35:RelSubset#0.NONE.[0]], tuples=[[{ 1.0 }, 
{ 2.0 }, { 3.0 }, { 4.0 }, { 5.0 }]])
       LogicalProject(subset=[rel#40:RelSubset#3.NONE.[0]], 
A=[CAST($0):DECIMAL(11, 1) NOT NULL])
         LogicalValues(subset=[rel#38:RelSubset#2.NONE.[0]], tuples=[[{ 1 }, { 
2 }]])
     LogicalProject(subset=[rel#45:RelSubset#6.NONE.[0]], 
A=[CAST($0):DECIMAL(11, 1)])
       LogicalValues(subset=[rel#43:RelSubset#5.NONE.[0]], tuples=[[{ 1.0 }, { 
4.0 }, { null }]])
   
   DEBUG - call#422: Rule [JoinPushExpressionsRule] produced 
[rel#106:LogicalProject]
   DEBUG - call#422: Full plan for [rel#106:LogicalProject]:
   LogicalProject(A=[$0])
     LogicalJoin(condition=[=($1, $3)], joinType=[semi])
       LogicalProject(A=[$0], A0=[CAST($0):DECIMAL(11, 1)])
         LogicalIntersect(subset=[rel#42:RelSubset#4.NONE.[]], all=[false])
           LogicalProject(subset=[rel#37:RelSubset#1.NONE.[0]], 
A=[CAST($0):DECIMAL(11, 1) NOT NULL])
             LogicalValues(subset=[rel#35:RelSubset#0.NONE.[0]], tuples=[[{ 1.0 
}, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0 }]])
           LogicalProject(subset=[rel#40:RelSubset#3.NONE.[0]], 
A=[CAST($0):DECIMAL(11, 1) NOT NULL])
             LogicalValues(subset=[rel#38:RelSubset#2.NONE.[0]], tuples=[[{ 1 
}, { 2 }]])
       LogicalProject(A=[$0], A0=[CAST($0):DECIMAL(11, 1)])
         LogicalProject(subset=[rel#45:RelSubset#6.NONE.[0]], 
A=[CAST($0):DECIMAL(11, 1)])
           LogicalValues(subset=[rel#43:RelSubset#5.NONE.[0]], tuples=[[{ 1.0 
}, { 4.0 }, { null }]])
   
   DEBUG - call#422 generated 1 successors: 
[rel#106:LogicalProject.NONE.[](input=LogicalJoin#105,inputs=0)]
   ```
   
   Old code: `DEBUG - call#422 generated 0 successors.`
   
   Concluding, the root cause is in `IntersectToSemiJoinRule` which could be 
improved as discussed, the extra cast simplifications happening due to the 
improved detection of lossless casts unlock further transformations uncovering 
this "issue"/redundancy (not a correctness issue, just an improvement, so not 
critical).
   
   I have filed CALCITE-7203 to track this improvement.
   
   @xiedeyantu, if you are happy with the other two changes you requested and 
that I have already pushed, I plan to merge this PR soon as the two other 
reviewers have approved already.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to