[
https://issues.apache.org/jira/browse/CALCITE-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023302#comment-18023302
]
Zhen Chen commented on CALCITE-7203:
------------------------------------
I completely agree with your point. I was going to try to improve this point,
but I noticed you've already assigned it to yourself. If you're willing to
improve this part, I'd be happy to review it. If you'd like me to make some
changes, please let me know.
> IntersectToSemiJoinRule should compute once the join keys and reuse them to
> avoid duplicates
> --------------------------------------------------------------------------------------------
>
> Key: CALCITE-7203
> URL: https://issues.apache.org/jira/browse/CALCITE-7203
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.40.0
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
>
> [IntersectToSemiJoinRule|https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/main/java/org/apache/calcite/rel/rules/IntersectToSemiJoinRule.java#L119-L128]
> repeatedly creates cast expressions between pair of intersect operands,
> while we could "pre-compute" these join keys targeting the row type of the
> n-way intersect expression, which is the final type that all intersect
> operands must conform to.
> Computing the join keys pair-wise, the current implementation, introduces
> duplicates and noise due to the partial type unification vs the stable type
> unification over the final/global row type.
> [planner.iq#L150-L179|https://github.com/apache/calcite/blob/9014934d8c24a5242a6840efe20134e820426c24/core/src/test/resources/sql/planner.iq#L150-L179]
> could be simplified;
> before:
> {noformat}
> EnumerableCalc(expr#0..1=[{inputs}], expr#2=[CAST($t0):DECIMAL(11, 1)],
> A=[$t2])
> EnumerableHashJoin(condition=[=($1, $3)], joinType=[semi])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
> proj#0..1=[{exprs}])
> EnumerableAggregate(group=[{0}])
> EnumerableHashJoin(condition=[=($1, $3)], joinType=[semi])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
> NOT NULL], A=[$t1], A0=[$t1])
> EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, {
> 5.0 }]])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
> NOT NULL], A=[$t1], A0=[$t1])
> EnumerableValues(tuples=[[{ 1 }, { 2 }]])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
> A=[$t1], A0=[$t1]) <= extra A0
> EnumerableValues(tuples=[[{ 1.0 }, { 4.0 }, { null }]]){noformat}
> after:
> {noformat}
> EnumerableAggregate(group=[{0}])
> EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
> joinType=[semi])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
> A=[$t1])
> EnumerableAggregate(group=[{0}])
> EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
> joinType=[semi])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
> NOT NULL], A=[$t1])
> EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, {
> 5.0 }]])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
> NOT NULL], A=[$t1]) <= no more A0
> EnumerableValues(tuples=[[{ 1 }, { 2 }]])
> EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
> A=[$t1])
> EnumerableValues(tuples=[[{ 1.0 }, { 4.0 }, { null }]]){noformat}
> [This PR
> discussion|https://github.com/apache/calcite/pull/4557#discussion_r2384022473]
> elaborates even more on why this is needed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)