asolimando commented on code in PR #4557:
URL: https://github.com/apache/calcite/pull/4557#discussion_r2384022473
##########
core/src/test/resources/sql/planner.iq:
##########
@@ -223,15 +224,16 @@ select a from (values (1.0), (4.0), (null)) as t3 (a);
!ok
-EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], A=[$t1])
- EnumerableNestedLoopJoin(condition=[OR(AND(IS NULL(CAST($0):DECIMAL(11, 1)),
IS NULL(CAST($1):DECIMAL(11, 1))), =(CAST($0):DECIMAL(11, 1),
CAST($1):DECIMAL(11, 1)))], joinType=[anti])
- EnumerableAggregate(group=[{0}])
- EnumerableNestedLoopJoin(condition=[=(CAST($0):DECIMAL(11, 1) NOT NULL,
CAST($1):DECIMAL(11, 1) NOT NULL)], joinType=[anti])
- EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) NOT
NULL], A=[$t1])
- EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0
}]])
- EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) NOT
NULL], A=[$t1])
- EnumerableValues(tuples=[[{ 1 }, { 2 }]])
- EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1])
+EnumerableCalc(expr#0..1=[{inputs}], expr#2=[CAST($t0):DECIMAL(11, 1)],
A=[$t2])
+ EnumerableHashJoin(condition=[=($1, $3)], joinType=[anti])
+ EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
proj#0..1=[{exprs}])
+ EnumerableAggregate(group=[{0}])
+ EnumerableNestedLoopJoin(condition=[=(CAST($0):DECIMAL(11, 1) NOT
NULL, CAST($1):DECIMAL(11, 1) NOT NULL)], joinType=[anti])
+ EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1])
+ EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, {
5.0 }]])
+ EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1])
+ EnumerableValues(tuples=[[{ 1 }, { 2 }]])
+ EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1], A0=[$t1])
Review Comment:
I took a deeper look, the root cause is
[IntersectToSemiJoinRule.java#L119-L128](https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/main/java/org/apache/calcite/rel/rules/IntersectToSemiJoinRule.java#L119-L128),
which repeatedly creates cast expressions between pair of intersect operands
repeatedly, while we could "pre-compute" these join keys targeting the
`leastRowType`, which is the final type that all intersect operands must
conform to, instead of pair-wise between intersect operands, which introduces
duplicates and noise due to the partial type unification vs the stable type
unification against `leastRowType`.
All these cast expressions are potentially pushed down by
`JoinPushExpressionsRule` as follows:
```
LogicalJoin(condition=[=(CAST($0):DECIMAL(11, 1) NOT NULL,
CAST($1):DECIMAL(11, 1) NOT NULL)], joinType=[semi])
LogicalProject(subset=[rel#37:RelSubset#1.NONE.[0]],
A=[CAST($0):DECIMAL(11, 1) NOT NULL])
LogicalValues(subset=[rel#35:RelSubset#0.NONE.[0]], tuples=[[{ 1.0 }, {
2.0 }, { 3.0 }, { 4.0 }, { 5.0 }]])
```
becomes:
```
LogicalJoin(condition=[=($1, $3)], joinType=[semi])
LogicalProject(A=[$0], A0=[CAST($0):DECIMAL(11, 1) NOT NULL])
LogicalProject(subset=[rel#37:RelSubset#1.NONE.[0]],
A=[CAST($0):DECIMAL(11, 1) NOT NULL])
LogicalValues(subset=[rel#35:RelSubset#0.NONE.[0]], tuples=[[{ 1.0
}, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0 }]])
```
This wasn't happening before due to the more complex join condition, which
included an `OR` with `IS NULL` (`LogicalJoin(condition=[OR(AND(IS
NULL(CAST($0):DECIMAL(11, 1)), IS NULL(CAST($1):DECIMAL(11, 1))),
=(CAST($0):DECIMAL(11, 1), CAST($1):DECIMAL(11, 1)))], joinType=[semi])` vs
`LogicalJoin(condition=[=(CAST($0):DECIMAL(11, 1), CAST($1):DECIMAL(11, 1))],
joinType=[semi])` now due to simplifications allowed by the improved lossless
cast detection).
It suffices to compare the logs to find `JoinPushExpressionsRule` being
applied in the new code while it wasn't the case in the old code:
New code:
```
JoinPushExpressionsRule:
DEBUG - Transform to: rel#106 via JoinPushExpressionsRule
DEBUG - call#422: Full plan for rule input [rel#75:LogicalJoin]:
LogicalJoin(condition=[=(CAST($0):DECIMAL(11, 1), CAST($1):DECIMAL(11, 1))],
joinType=[semi])
LogicalIntersect(subset=[rel#42:RelSubset#4.NONE.[]], all=[false])
LogicalProject(subset=[rel#37:RelSubset#1.NONE.[0]],
A=[CAST($0):DECIMAL(11, 1) NOT NULL])
LogicalValues(subset=[rel#35:RelSubset#0.NONE.[0]], tuples=[[{ 1.0 },
{ 2.0 }, { 3.0 }, { 4.0 }, { 5.0 }]])
LogicalProject(subset=[rel#40:RelSubset#3.NONE.[0]],
A=[CAST($0):DECIMAL(11, 1) NOT NULL])
LogicalValues(subset=[rel#38:RelSubset#2.NONE.[0]], tuples=[[{ 1 }, {
2 }]])
LogicalProject(subset=[rel#45:RelSubset#6.NONE.[0]],
A=[CAST($0):DECIMAL(11, 1)])
LogicalValues(subset=[rel#43:RelSubset#5.NONE.[0]], tuples=[[{ 1.0 }, {
4.0 }, { null }]])
DEBUG - call#422: Rule [JoinPushExpressionsRule] produced
[rel#106:LogicalProject]
DEBUG - call#422: Full plan for [rel#106:LogicalProject]:
LogicalProject(A=[$0])
LogicalJoin(condition=[=($1, $3)], joinType=[semi])
LogicalProject(A=[$0], A0=[CAST($0):DECIMAL(11, 1)])
LogicalIntersect(subset=[rel#42:RelSubset#4.NONE.[]], all=[false])
LogicalProject(subset=[rel#37:RelSubset#1.NONE.[0]],
A=[CAST($0):DECIMAL(11, 1) NOT NULL])
LogicalValues(subset=[rel#35:RelSubset#0.NONE.[0]], tuples=[[{ 1.0
}, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0 }]])
LogicalProject(subset=[rel#40:RelSubset#3.NONE.[0]],
A=[CAST($0):DECIMAL(11, 1) NOT NULL])
LogicalValues(subset=[rel#38:RelSubset#2.NONE.[0]], tuples=[[{ 1
}, { 2 }]])
LogicalProject(A=[$0], A0=[CAST($0):DECIMAL(11, 1)])
LogicalProject(subset=[rel#45:RelSubset#6.NONE.[0]],
A=[CAST($0):DECIMAL(11, 1)])
LogicalValues(subset=[rel#43:RelSubset#5.NONE.[0]], tuples=[[{ 1.0
}, { 4.0 }, { null }]])
DEBUG - call#422 generated 1 successors:
[rel#106:LogicalProject.NONE.[](input=LogicalJoin#105,inputs=0)]
```
Old code: `DEBUG - call#422 generated 0 successors.`
Concluding, the root cause is in `IntersectToSemiJoinRule` which could be
improved as discussed, the extra cast simplifications happening due to the
improved detection of lossless casts unlock further transformations uncovering
this "issue"/redundancy (not a correctness issue, just an improvement, so not
critical).
I have filed CALCITE-7203 to track this improvement.
@xiedeyantu, if you are happy with the other two changes you requested and
that I have already pushed, I plan to merge this PR soon as the two other
reviewers have approved already.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]