[jira] [Commented] (CALCITE-6452) Scalar sub-query that uses IS NOT DISTINCT FROM returns incorrect result

Adam Markowitz (Jira) Thu, 12 Sep 2024 07:05:29 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881317#comment-17881317
 ]


Adam Markowitz commented on CALCITE-6452:
-----------------------------------------

After further investigation, the above physical plan looks correct if you follo 
the `$3` reference.

The final logical plan ended up being the following
{code:java}
LogicalProject(ENAME=[$1], C=[CASE(IS NULL($6), 0:BIGINT, $6)])
  LogicalJoin(condition=[AND(=($2, $4), =($3, $5))], joinType=[left])
    LogicalProject(EMPNO=[$0], ENAME=[$1], COMM=[$6], $f8=[IS NULL($6)])
      LogicalTableScan(table=[[scott, EMP]])
    LogicalAggregate(group=[{0, 1}], EXPR$0=[COUNT()])
      LogicalProject(COMM0=[$2], $f8=[$3])
        LogicalJoin(condition=[OR(AND(IS NULL($1), $3), =($1, $2))], 
joinType=[inner])
          LogicalProject(EMPNO=[$0], COMM=[$6])
            LogicalTableScan(table=[[scott, EMP]])
          LogicalAggregate(group=[{0, 1}])
            LogicalProject(COMM=[$6], $f8=[IS NULL($6)])
              LogicalTableScan(table=[[scott, EMP]]) {code}
and leveraging a `relToSql` on that yields:
{code:java}
SELECT t0.ENAME, CASE WHEN t5.EXPR$0 IS NULL THEN 0 ELSE t5.EXPR$0 END AS C
FROM (SELECT EMPNO, ENAME, COMM, COMM IS NULL AS $f8
FROM scott.EMP
ORDER BY COMM, 4) AS t0
LEFT JOIN (SELECT t3.COMM AS COMM0, t3.$f8, COUNT(*) AS EXPR$0
FROM (SELECT EMPNO, COMM
FROM scott.EMP) AS t1
INNER JOIN (SELECT COMM, COMM IS NULL AS $f8
FROM scott.EMP
GROUP BY COMM, COMM IS NULL) AS t3 ON t1.COMM IS NULL AND t3.$f8 OR t1.COMM = 
t3.COMM
GROUP BY t3.COMM, t3.$f8
ORDER BY t3.COMM, t3.$f8) AS t5 ON t0.COMM = t5.COMM0 AND t0.$f8 = t5.$f8 {code}
after further investigation into this with [~tjbanghart] all ephemeral 
relations are producing the correct results and it's the outer most join 
condition that appears incorrect. 

Changing it from`t0.COMM = t5.COMM0 AND t0.$f8 = t5.$f8` to `t0.COMM = t5.COMM0 
OR (t0.__f8 AND t5.__f8)` produces the expected results.

Why does this change work? - It still looks for equality on the values of COMM 
between the relations, however to mimic the IS NOT DISTINCT for _null_ value 
checks we must only check for TRUE in both f8 references as checking for 
equality will give us false positive matches when each value of COMM are not 
null, but also may not be equal.

> Scalar sub-query that uses IS NOT DISTINCT FROM returns incorrect result
> ------------------------------------------------------------------------
>
>                 Key: CALCITE-6452
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6452
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Julian Hyde
>            Assignee: Adam Markowitz
>            Priority: Major
>
> Scalar sub-query that uses an {{IS NOT DISTINCT FROM}} condition returns 
> incorrect result. For example,
> {code:java}
> select e.ename,
>   (select count(*)
>     from emp as f
>     where f.comm is not distinct from e.comm) as c
> from emp as e{code}
> returns
> {noformat}
> +--------+----+
> | ENAME  | C  |
> +--------+----+
> | ADAMS  |  0 |
> | ALLEN  |  1 |
> | BLAKE  |  0 |
> | CLARK  |  0 |
> | FORD   |  0 |
> | JAMES  |  0 |
> | JONES  |  0 |
> | KING   |  0 |
> | MARTIN |  1 |
> | MILLER |  0 |
> | SCOTT  |  0 |
> | SMITH  |  0 |
> | TURNER |  1 |
> | WARD   |  1 |
> +--------+----+
> (14 rows)
> {noformat}
> but should return
> {noformat}
> +--------+----+
> | ENAME  | C  |
> +--------+----+
> | ADAMS  |  0 |
> | ALLEN  | 10 |
> | BLAKE  |  0 |
> | CLARK  |  0 |
> | FORD   |  0 |
> | JAMES  |  0 |
> | JONES  |  0 |
> | KING   |  0 |
> | MARTIN | 10 |
> | MILLER |  0 |
> | SCOTT  |  0 |
> | SMITH  |  0 |
> | TURNER | 10 |
> | WARD   | 10 |
> +--------+----+
> (14 rows)
> {noformat}
> Also, and perhaps related, if we add a {{WHERE}} to the above query, like 
> this:
> {code:java}
> select e.ename,
>   (select count(*)
>     from emp as f
>     where f.comm is not distinct from e.comm) as c
> from emp as e
> where e.deptno = 10{code}
> Calcite throws:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 6 out of bounds 
> for length 5
> >     at 
> > com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:77)
> >     at org.apache.calcite.rex.RexBuilder.makeInputRef(RexBuilder.java:1037)
> >     at 
> > org.apache.calcite.sql2rel.RelDecorrelator$AdjustProjectForCountAggregateRule.lambda$onMatch2$4(RelDecorrelator.java:2679)
> {noformat}
> I ran into this error while trying to rewrite queries that had measures and 
> used {{ROLLUP}}. (Since {{ROLLUP}} will generate group keys whose value is 
> NULL, we cannot use regular {{=}} when doing a self-join.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-6452) Scalar sub-query that uses IS NOT DISTINCT FROM returns incorrect result

Reply via email to