[ 
https://issues.apache.org/jira/browse/CALCITE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867133#comment-17867133
 ] 

TJ Banghart edited comment on CALCITE-6452 at 7/18/24 11:12 PM:
----------------------------------------------------------------

Okay great, so this makes sense and I think the issue for the first case is 
that the implicit inner join is losing `IS NULL` predicate when being converted 
to an {{EnumerableRel}}

Logical plan
{code:java}
LogicalProject(variablesSet=[[$cor0]], ENAME=[$1], C=[$SCALAR_QUERY({
LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
  LogicalFilter(condition=[OR(AND(IS NULL($6), IS NULL($cor0.COMM)), IS 
TRUE(=($6, $cor0.COMM)))])
    LogicalTableScan(table=[[scott, EMP]])})])
  LogicalTableScan(table=[[scott, EMP]])
{code}

Physical plan
{code:java}
EnumerableCalc(expr#0..6=[{inputs}], expr#7=[IS NULL($t6)], expr#8=[0:BIGINT], 
expr#9=[CASE($t7, $t8, $t6)], ENAME=[$t1], C=[$t9])
  EnumerableMergeJoin(condition=[AND(=($2, $4), =($3, $5))], joinType=[left])
    EnumerableSort(sort0=[$2], sort1=[$3], dir0=[ASC], dir1=[ASC])
      EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t6)], 
proj#0..1=[{exprs}], COMM=[$t6], $f8=[$t8])
        EnumerableTableScan(table=[[scott, EMP]])
    EnumerableSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
      EnumerableAggregate(group=[{2, 3}], EXPR$0=[COUNT()])
        EnumerableNestedLoopJoin(condition=[OR(AND(IS NULL($1), $3), =($1, 
$2))], joinType=[inner])
          EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], COMM=[$t6])
            EnumerableTableScan(table=[[scott, EMP]])
          EnumerableAggregate(group=[{0, 1}])
            EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t6)], 
COMM=[$t6], $f8=[$t8])
              EnumerableTableScan(table=[[scott, EMP]])
{code}

Notice that {{OR(AND(IS NULL($6), IS NULL($cor0.COMM))}} becomes {{OR(AND(IS 
NULL($1), $3))}} before the {{EnumerableAggregate}}. Will continue looking into 
it.



was (Author: tjbanghart):
Okay great, so this makes sense and I think the issue for the first case is 
that the implicit inner join is losing `IS NULL` predicate when being converted 
to an {{EnumerableRel}}

Logical plan
{code:java}
LogicalProject(variablesSet=[[$cor0]], ENAME=[$1], C=[$SCALAR_QUERY({
LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
  LogicalFilter(condition=[OR(AND(IS NULL($6), IS NULL($cor0.COMM)), IS 
TRUE(=($6, $cor0.COMM)))])
    LogicalTableScan(table=[[scott, EMP]])})])
  LogicalTableScan(table=[[scott, EMP]])
{code}

Physical plan
{code:java}
EnumerableCalc(expr#0..6=[{inputs}], expr#7=[IS NULL($t6)], expr#8=[0:BIGINT], 
expr#9=[CASE($t7, $t8, $t6)], ENAME=[$t1], C=[$t9])
  EnumerableMergeJoin(condition=[AND(=($2, $4), =($3, $5))], joinType=[left])
    EnumerableSort(sort0=[$2], sort1=[$3], dir0=[ASC], dir1=[ASC])
      EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t6)], 
proj#0..1=[{exprs}], COMM=[$t6], $f8=[$t8])
        EnumerableTableScan(table=[[scott, EMP]])
    EnumerableSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
      EnumerableAggregate(group=[{2, 3}], EXPR$0=[COUNT()])
        EnumerableNestedLoopJoin(condition=[OR(AND(IS NULL($1), $3), =($1, 
$2))], joinType=[inner])
          EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], COMM=[$t6])
            EnumerableTableScan(table=[[scott, EMP]])
          EnumerableAggregate(group=[{0, 1}])
            EnumerableCalc(expr#0..7=[{inputs}], expr#8=[IS NULL($t6)], 
COMM=[$t6], $f8=[$t8])
              EnumerableTableScan(table=[[scott, EMP]])
{code}

Notice that {{OR(AND(IS NULL($6), IS NULL($cor0.COMM))}} becomes {{OR(AND(IS 
NULL($1), $3))}}. Will continue looking into it.


> Scalar sub-query that uses IS NOT DISTINCT FROM returns incorrect result
> ------------------------------------------------------------------------
>
>                 Key: CALCITE-6452
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6452
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Julian Hyde
>            Assignee: TJ Banghart
>            Priority: Major
>
> Scalar sub-query that uses an {{IS NOT DISTINCT FROM}} condition returns 
> incorrect result. For example,
> {code:java}
> select e.ename,
>   (select count(*)
>     from emp as f
>     where f.comm is not distinct from e.comm) as c
> from emp as e{code}
> returns
> {noformat}
> +--------+----+
> | ENAME  | C  |
> +--------+----+
> | ADAMS  |  0 |
> | ALLEN  |  1 |
> | BLAKE  |  0 |
> | CLARK  |  0 |
> | FORD   |  0 |
> | JAMES  |  0 |
> | JONES  |  0 |
> | KING   |  0 |
> | MARTIN |  1 |
> | MILLER |  0 |
> | SCOTT  |  0 |
> | SMITH  |  0 |
> | TURNER |  1 |
> | WARD   |  1 |
> +--------+----+
> (14 rows)
> {noformat}
> but should return
> {noformat}
> +--------+----+
> | ENAME  | C  |
> +--------+----+
> | ADAMS  |  0 |
> | ALLEN  | 10 |
> | BLAKE  |  0 |
> | CLARK  |  0 |
> | FORD   |  0 |
> | JAMES  |  0 |
> | JONES  |  0 |
> | KING   |  0 |
> | MARTIN | 10 |
> | MILLER |  0 |
> | SCOTT  |  0 |
> | SMITH  |  0 |
> | TURNER | 10 |
> | WARD   | 10 |
> +--------+----+
> (14 rows)
> {noformat}
> Also, and perhaps related, if we add a {{WHERE}} to the above query, like 
> this:
> {code:java}
> select e.ename,
>   (select count(*)
>     from emp as f
>     where f.comm is not distinct from e.comm) as c
> from emp as e
> where e.deptno = 10{code}
> Calcite throws:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 6 out of bounds 
> for length 5
> >     at 
> > com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:77)
> >     at org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:1037)
> >     at 
> > org.apache.calcite.sql2rel.RelDecorrelator$AdjustProjectForCountAggregateRule.lambda$onMatch2$4(RelDecorrelator.java:2679)
> {noformat}
> I ran into this error while trying to rewrite queries that had measures and 
> used {{ROLLUP}}. (Since {{ROLLUP}} will generate group keys whose value is 
> NULL, we cannot use regular {{=}} when doing a self-join.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to