asolimando commented on code in PR #4557:
URL: https://github.com/apache/calcite/pull/4557#discussion_r2387220908
##########
core/src/test/resources/sql/planner.iq:
##########
@@ -223,15 +224,16 @@ select a from (values (1.0), (4.0), (null)) as t3 (a);
!ok
-EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], A=[$t1])
- EnumerableNestedLoopJoin(condition=[OR(AND(IS NULL(CAST($0):DECIMAL(11, 1)),
IS NULL(CAST($1):DECIMAL(11, 1))), =(CAST($0):DECIMAL(11, 1),
CAST($1):DECIMAL(11, 1)))], joinType=[anti])
- EnumerableAggregate(group=[{0}])
- EnumerableNestedLoopJoin(condition=[=(CAST($0):DECIMAL(11, 1) NOT NULL,
CAST($1):DECIMAL(11, 1) NOT NULL)], joinType=[anti])
- EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) NOT
NULL], A=[$t1])
- EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0
}]])
- EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) NOT
NULL], A=[$t1])
- EnumerableValues(tuples=[[{ 1 }, { 2 }]])
- EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1])
+EnumerableCalc(expr#0..1=[{inputs}], expr#2=[CAST($t0):DECIMAL(11, 1)],
A=[$t2])
+ EnumerableHashJoin(condition=[=($1, $3)], joinType=[anti])
+ EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
proj#0..1=[{exprs}])
+ EnumerableAggregate(group=[{0}])
+ EnumerableNestedLoopJoin(condition=[=(CAST($0):DECIMAL(11, 1) NOT
NULL, CAST($1):DECIMAL(11, 1) NOT NULL)], joinType=[anti])
+ EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1])
+ EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, {
5.0 }]])
+ EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1])
+ EnumerableValues(tuples=[[{ 1 }, { 2 }]])
+ EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1], A0=[$t1])
Review Comment:
In engineering, the root cause is the earliest, fundamental condition in the
cause-effect chain such that, if you change it, the symptom won’t recur. It’s
the source behavior that keeps producing the symptom whenever downstream
conditions allow it to surface.
We agree on this definition and on how `A0` (and other redundant
expressions) end up in the final plan. Since you asked specifically "Why is
there an extra A0 here?", my answer focused on this instance:
By the above definition, the root cause is the current implementation of
`IntersectToSemiJoinRule`, it can't be the limitation of simplification,
because that alone isn't enough to reproduce the issue, that's what I mean, I
am not saying that you can't fix it by improving it nor that I don't
acknowledge the limitation.
We both agree that the improved DECIMAL lossless-cast simplification turns
the predicate into a pure equi-join, enabling `JoinPushExpressionsRule` to push
the casts down. And once those are referenced by the join, nothing removes them
(as you have correctly stated, and I again agree).
Possible fixes I have considered: 1) remove the root cause (CALCITE-7203:
precompute keys to leastRowType), or 2) enhance `JoinPushExpressionsRule` to
detect and reuse equivalent expressions. I went with 1) as it was simpler and
that's what we typically do, we both simplify at the source and rely on
existing simplification/pruning, over plugging redundancy checks into every
rule that introduces predicates.
For this concrete case, we agree CALCITE-7203 is the most efficient,
localized fix.
More broadly, while we strive to eliminate redundancy, fully complete
simplification isn’t practical: detecting and removing all equivalent
expressions can require exponential expansion in worst cases. That’s why we
prefer targeted source fixes plus general simplification, rather than pushing
heavy redundancy detection into each rule.
This said, if we can improve simplification we should always strive to do
that, I totally agree.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]