[
https://issues.apache.org/jira/browse/CALCITE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867133#comment-17867133
]
TJ Banghart edited comment on CALCITE-6452 at 7/18/24 11:12 PM:
----------------------------------------------------------------
Okay great, so this makes sense and I think the issue for the first case is
that the implicit inner join is losing `IS NULL` predicate when being converted
to an {{EnumerableRel}}
Logical plan
{code:java}
LogicalProject(variablesSet=[[$cor0]], ENAME=[$1], C=[$SCALAR_QUERY({
LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
LogicalFilter(condition=[OR(AND(IS NULL($6), IS NULL($cor0.COMM)), IS
TRUE(=($6, $cor0.COMM)))])
LogicalTableScan(table=[[scott, EMP]])})])
LogicalTableScan(table=[[scott, EMP]])
{code}
Physical plan
{code:java}
EnumerableCalc(expr#0..6=[{inputs}], expr#7=[IS NULL($t6)], expr#8=[0:BIGINT],
expr#9=[CASE($t7, $t8, $t6)], ENAME=[$t1], C=[$t9])
EnumerableMergeJoin(condition=[AND(=($2, $4), =($3, $5))], joinType=[left])
EnumerableSort(sort0=[$2], sort1=[$3], dir0=[ASC], dir1=[ASC])
EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t6)],
proj#0..1=[{exprs}], COMM=[$t6], $f8=[$t8])
EnumerableTableScan(table=[[scott, EMP]])
EnumerableSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
EnumerableAggregate(group=[{2, 3}], EXPR$0=[COUNT()])
EnumerableNestedLoopJoin(condition=[OR(AND(IS NULL($1), $3), =($1,
$2))], joinType=[inner])
EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], COMM=[$t6])
EnumerableTableScan(table=[[scott, EMP]])
EnumerableAggregate(group=[{0, 1}])
EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t6)],
COMM=[$t6], $f8=[$t8])
EnumerableTableScan(table=[[scott, EMP]])
{code}
Notice that {{OR(AND(IS NULL($6), IS NULL($cor0.COMM))}} becomes {{OR(AND(IS
NULL($1), $3))}} before the {{EnumerableAggregate}}. Will continue looking into
it.
was (Author: tjbanghart):
Okay great, so this makes sense and I think the issue for the first case is
that the implicit inner join is losing `IS NULL` predicate when being converted
to an {{EnumerableRel}}
Logical plan
{code:java}
LogicalProject(variablesSet=[[$cor0]], ENAME=[$1], C=[$SCALAR_QUERY({
LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
LogicalFilter(condition=[OR(AND(IS NULL($6), IS NULL($cor0.COMM)), IS
TRUE(=($6, $cor0.COMM)))])
LogicalTableScan(table=[[scott, EMP]])})])
LogicalTableScan(table=[[scott, EMP]])
{code}
Physical plan
{code:java}
EnumerableCalc(expr#0..6=[{inputs}], expr#7=[IS NULL($t6)], expr#8=[0:BIGINT],
expr#9=[CASE($t7, $t8, $t6)], ENAME=[$t1], C=[$t9])
EnumerableMergeJoin(condition=[AND(=($2, $4), =($3, $5))], joinType=[left])
EnumerableSort(sort0=[$2], sort1=[$3], dir0=[ASC], dir1=[ASC])
EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t6)],
proj#0..1=[{exprs}], COMM=[$t6], $f8=[$t8])
EnumerableTableScan(table=[[scott, EMP]])
EnumerableSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
EnumerableAggregate(group=[{2, 3}], EXPR$0=[COUNT()])
EnumerableNestedLoopJoin(condition=[OR(AND(IS NULL($1), $3), =($1,
$2))], joinType=[inner])
EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], COMM=[$t6])
EnumerableTableScan(table=[[scott, EMP]])
EnumerableAggregate(group=[{0, 1}])
EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t6)],
COMM=[$t6], $f8=[$t8])
EnumerableTableScan(table=[[scott, EMP]])
{code}
Notice that {{OR(AND(IS NULL($6), IS NULL($cor0.COMM))}} becomes {{OR(AND(IS
NULL($1), $3))}}. Will continue looking into it.
> Scalar sub-query that uses IS NOT DISTINCT FROM returns incorrect result
> ------------------------------------------------------------------------
>
> Key: CALCITE-6452
> URL: https://issues.apache.org/jira/browse/CALCITE-6452
> Project: Calcite
> Issue Type: Improvement
> Reporter: Julian Hyde
> Assignee: TJ Banghart
> Priority: Major
>
> Scalar sub-query that uses an {{IS NOT DISTINCT FROM}} condition returns
> incorrect result. For example,
> {code:java}
> select e.ename,
> (select count(*)
> from emp as f
> where f.comm is not distinct from e.comm) as c
> from emp as e{code}
> returns
> {noformat}
> +--------+----+
> | ENAME | C |
> +--------+----+
> | ADAMS | 0 |
> | ALLEN | 1 |
> | BLAKE | 0 |
> | CLARK | 0 |
> | FORD | 0 |
> | JAMES | 0 |
> | JONES | 0 |
> | KING | 0 |
> | MARTIN | 1 |
> | MILLER | 0 |
> | SCOTT | 0 |
> | SMITH | 0 |
> | TURNER | 1 |
> | WARD | 1 |
> +--------+----+
> (14 rows)
> {noformat}
> but should return
> {noformat}
> +--------+----+
> | ENAME | C |
> +--------+----+
> | ADAMS | 0 |
> | ALLEN | 10 |
> | BLAKE | 0 |
> | CLARK | 0 |
> | FORD | 0 |
> | JAMES | 0 |
> | JONES | 0 |
> | KING | 0 |
> | MARTIN | 10 |
> | MILLER | 0 |
> | SCOTT | 0 |
> | SMITH | 0 |
> | TURNER | 10 |
> | WARD | 10 |
> +--------+----+
> (14 rows)
> {noformat}
> Also, and perhaps related, if we add a {{WHERE}} to the above query, like
> this:
> {code:java}
> select e.ename,
> (select count(*)
> from emp as f
> where f.comm is not distinct from e.comm) as c
> from emp as e
> where e.deptno = 10{code}
> Calcite throws:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 6 out of bounds
> for length 5
> > at
> > com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:77)
> > at org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:1037)
> > at
> > org.apache.calcite.sql2rel.RelDecorrelator$AdjustProjectForCountAggregateRule.lambda$onMatch2$4(RelDecorrelator.java:2679)
> {noformat}
> I ran into this error while trying to rewrite queries that had measures and
> used {{ROLLUP}}. (Since {{ROLLUP}} will generate group keys whose value is
> NULL, we cannot use regular {{=}} when doing a self-join.)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)