[ 
https://issues.apache.org/jira/browse/CALCITE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sysolyatin updated CALCITE-5209:
---------------------------------------
    Description: 
For these examples assume that:
 - {{`SqlToRelConverter.Config.config().withInSubQueryThreshold(5)`}} has been 
set.
 - The source table has 4 columns (Column_0 through Column_3)
 - Each column is of type BIGINT NOT NULL

The failing query is:
{code:java}
select
    case
        when Column_3 in (1, 2, 3, 4, 5) THEN 1
        else 0
    end
from T1000
group by
    case
        when Column_3 in (1, 2, 3, 4, 5) THEN 1
        else 0
    end
{code}
The exception is:
{code:java}
3
java.lang.ArrayIndexOutOfBoundsException: 3
        at 
com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:75)
        at org.apache.calcite.tools.RelBuilder.inferAlias(RelBuilder.java:2163)
        at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:1956)
        at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1797)
        at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1769)
        at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1758)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.register(SqlToRelConverter.java:4680)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.substituteSubQuery(SqlToRelConverter.java:1268)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.replaceSubQueries(SqlToRelConverter.java:1127)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:3325)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:3192)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:738)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:664)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3589)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:589)
        at org.apache.calcite.prepare.PlannerImpl.rel(PlannerImpl.java:259)
{code}
The table could contain N >= 4 columns, and the use of any column at or after 
index 3 will also cause this exception.
The use of any column before index 3 gives the RelNode tree:
{code:java}
66:LogicalProject(EXPR$0=[$0])
  65:LogicalJoin(condition=[=($3, $4)], joinType=[left])
    61:LogicalProject(EXPR$0=[$0], $f0=[$1], $f1=[$2], $f10=[$2])
      60:LogicalJoin(condition=[true], joinType=[inner])
        56:LogicalAggregate(group=[{0}])
          55:LogicalProject(EXPR$0=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 
0)), AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
            54:LogicalJoin(condition=[=($6, $7)], joinType=[left])
              50:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], 
Column_3=[$3], $f0=[$4], $f1=[$5], Column_20=[$2])
                49:LogicalJoin(condition=[true], joinType=[inner])
                  45:TableScan(...)
                  48:LogicalAggregate(group=[{}], agg#0=[COUNT()], 
agg#1=[COUNT($0)])
                    47:LogicalProject(ROW_VALUE=[$0], $f1=[true])
                      46:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 
}]])
              53:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
                52:LogicalProject(ROW_VALUE=[$0], $f1=[true])
                  51:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
        59:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
          58:LogicalProject(ROW_VALUE=[$0], $f1=[true])
            57:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
    64:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
      63:LogicalProject(ROW_VALUE=[$0], $f1=[true])
        62:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
{code}
If the number of predicates is less than the subquery threshold, there are no 
issues.

Additionally, the below queries also produce RelNode trees
Using an alias:
{code:java}
select
    case
        when Column_3 in (1, 2, 3, 4, 5) THEN 1
        else 0
    end as CASE_ALIAS
from T1000
group by
    CASE_ALIAS
{code}
{code:java}
90:LogicalAggregate(group=[{0}])
  89:LogicalProject(CASE_ALIAS=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), 
AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
    88:LogicalJoin(condition=[=($6, $7)], joinType=[left])
      84:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], 
Column_3=[$3], $f0=[$4], $f1=[$5], Column_30=[$3])
        83:LogicalJoin(condition=[true], joinType=[inner])
          79:TableScan(...)
          82:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
            81:LogicalProject(ROW_VALUE=[$0], $f1=[true])
              80:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
      87:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
        86:LogicalProject(ROW_VALUE=[$0], $f1=[true])
          85:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
{code}
Not using the group-by value in the projection:
{code:java}
select
    count(*)
from T1000
group by
    case
        when Column_3 in (1, 2, 3, 4, 5) THEN 1
        else 0
    end
{code}
{code:java}
116:LogicalProject(EXPR$0=[$1])
  115:LogicalAggregate(group=[{0}], EXPR$0=[COUNT()])
    114:LogicalProject($f0=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), 
AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
      113:LogicalJoin(condition=[=($6, $7)], joinType=[left])
        109:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], 
Column_3=[$3], $f0=[$4], $f1=[$5], Column_30=[$3])
          108:LogicalJoin(condition=[true], joinType=[inner])
            104:TableScan(...)
            107:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
              106:LogicalProject(ROW_VALUE=[$0], $f1=[true])
                105:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
        112:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
          111:LogicalProject(ROW_VALUE=[$0], $f1=[true])
            110:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
{code}

  was:
For these examples assume that:
- {{`SqlToRelConverter.Config.config().withInSubQueryThreshold(5)`}} has been 
set.
- The source table has 4 columns (Column_0 through Column_3)
- Each column is of type BIGINT NOT NULL

The failing query is:
{code}
select
    case
        when Column_3 in (1, 2, 3, 4, 5) THEN 1
        else 0
    end
from T1000
group by
    case
        when Column_3 in (1, 2, 3, 4, 5) THEN 1
        else 0
    end
{code}

The exception is:
{code}
3
java.lang.ArrayIndexOutOfBoundsException: 3
        at 
com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:75)
        at org.apache.calcite.tools.RelBuilder.inferAlias(RelBuilder.java:2163)
        at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:1956)
        at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1797)
        at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1769)
        at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1758)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.register(SqlToRelConverter.java:4680)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.substituteSubQuery(SqlToRelConverter.java:1268)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.replaceSubQueries(SqlToRelConverter.java:1127)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:3325)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:3192)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:738)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:664)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3589)
        at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:589)
        at org.apache.calcite.prepare.PlannerImpl.rel(PlannerImpl.java:259)
{code}

The table could contain N >= 4 columns, and the use of any column at or after 
index 3 will also cause this exception.
The use of any column  before index 3 gives the RelNode tree:
{code}
66:LogicalProject(EXPR$0=[$0])
  65:LogicalJoin(condition=[=($3, $4)], joinType=[left])
    61:LogicalProject(EXPR$0=[$0], $f0=[$1], $f1=[$2], $f10=[$2])
      60:LogicalJoin(condition=[true], joinType=[inner])
        56:LogicalAggregate(group=[{0}])
          55:LogicalProject(EXPR$0=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 
0)), AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
            54:LogicalJoin(condition=[=($6, $7)], joinType=[left])
              50:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], 
Column_3=[$3], $f0=[$4], $f1=[$5], Column_20=[$2])
                49:LogicalJoin(condition=[true], joinType=[inner])
                  45:TableScan(...)
                  48:LogicalAggregate(group=[{}], agg#0=[COUNT()], 
agg#1=[COUNT($0)])
                    47:LogicalProject(ROW_VALUE=[$0], $f1=[true])
                      46:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 
}]])
              53:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
                52:LogicalProject(ROW_VALUE=[$0], $f1=[true])
                  51:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
        59:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
          58:LogicalProject(ROW_VALUE=[$0], $f1=[true])
            57:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
    64:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
      63:LogicalProject(ROW_VALUE=[$0], $f1=[true])
        62:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
{code}

If the number of predicates is less than the subquery threshold, there are no 
issues.

Additionally, the below queries also produce RelNode trees
Using an alias:
{code}
select
    case
        when Column_3 in (1, 2, 3, 4, 5) THEN 1
        else 0
    end as CASE_ALIAS
from T1000
group by
    CASE_ALIAS
{code}
{code}
90:LogicalAggregate(group=[{0}])
  89:LogicalProject(CASE_ALIAS=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), 
AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
    88:LogicalJoin(condition=[=($6, $7)], joinType=[left])
      84:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], 
Column_3=[$3], $f0=[$4], $f1=[$5], Column_30=[$3])
        83:LogicalJoin(condition=[true], joinType=[inner])
          79:TableScan(...)
          82:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
            81:LogicalProject(ROW_VALUE=[$0], $f1=[true])
              80:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
      87:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
        86:LogicalProject(ROW_VALUE=[$0], $f1=[true])
          85:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
{code}

Not using the group-by value in the projection:
{code}
select
    count(*)
from T1000
group by
    case
        when Column_3 in (1, 2, 3, 4, 5) THEN 1
        else 0
    end
{code}
{code}
116:LogicalProject(EXPR$0=[$1])
  115:LogicalAggregate(group=[{0}], EXPR$0=[COUNT()])
    114:LogicalProject($f0=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), 
AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
      113:LogicalJoin(condition=[=($6, $7)], joinType=[left])
        109:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], 
Column_3=[$3], $f0=[$4], $f1=[$5], Column_30=[$3])
          108:LogicalJoin(condition=[true], joinType=[inner])
            104:TableScan(...)
            107:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
              106:LogicalProject(ROW_VALUE=[$0], $f1=[true])
                105:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
        112:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
          111:LogicalProject(ROW_VALUE=[$0], $f1=[true])
            110:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
{code}





> Proper sub-query handling if it is used inside select list and group by
> -----------------------------------------------------------------------
>
>                 Key: CALCITE-5209
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5209
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.30.0
>            Reporter: Ian Bertolacci
>            Assignee: Dmitry Sysolyatin
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> For these examples assume that:
>  - {{`SqlToRelConverter.Config.config().withInSubQueryThreshold(5)`}} has 
> been set.
>  - The source table has 4 columns (Column_0 through Column_3)
>  - Each column is of type BIGINT NOT NULL
> The failing query is:
> {code:java}
> select
>     case
>         when Column_3 in (1, 2, 3, 4, 5) THEN 1
>         else 0
>     end
> from T1000
> group by
>     case
>         when Column_3 in (1, 2, 3, 4, 5) THEN 1
>         else 0
>     end
> {code}
> The exception is:
> {code:java}
> 3
> java.lang.ArrayIndexOutOfBoundsException: 3
>       at 
> com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:75)
>       at org.apache.calcite.tools.RelBuilder.inferAlias(RelBuilder.java:2163)
>       at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:1956)
>       at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1797)
>       at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1769)
>       at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1758)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.register(SqlToRelConverter.java:4680)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.substituteSubQuery(SqlToRelConverter.java:1268)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.replaceSubQueries(SqlToRelConverter.java:1127)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:3325)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:3192)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:738)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:664)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3589)
>       at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:589)
>       at org.apache.calcite.prepare.PlannerImpl.rel(PlannerImpl.java:259)
> {code}
> The table could contain N >= 4 columns, and the use of any column at or after 
> index 3 will also cause this exception.
> The use of any column before index 3 gives the RelNode tree:
> {code:java}
> 66:LogicalProject(EXPR$0=[$0])
>   65:LogicalJoin(condition=[=($3, $4)], joinType=[left])
>     61:LogicalProject(EXPR$0=[$0], $f0=[$1], $f1=[$2], $f10=[$2])
>       60:LogicalJoin(condition=[true], joinType=[inner])
>         56:LogicalAggregate(group=[{0}])
>           55:LogicalProject(EXPR$0=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 
> 0)), AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
>             54:LogicalJoin(condition=[=($6, $7)], joinType=[left])
>               50:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], 
> Column_3=[$3], $f0=[$4], $f1=[$5], Column_20=[$2])
>                 49:LogicalJoin(condition=[true], joinType=[inner])
>                   45:TableScan(...)
>                   48:LogicalAggregate(group=[{}], agg#0=[COUNT()], 
> agg#1=[COUNT($0)])
>                     47:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>                       46:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 
> 5 }]])
>               53:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>                 52:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>                   51:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 
> }]])
>         59:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>           58:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>             57:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
>     64:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>       63:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>         62:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> {code}
> If the number of predicates is less than the subquery threshold, there are no 
> issues.
> Additionally, the below queries also produce RelNode trees
> Using an alias:
> {code:java}
> select
>     case
>         when Column_3 in (1, 2, 3, 4, 5) THEN 1
>         else 0
>     end as CASE_ALIAS
> from T1000
> group by
>     CASE_ALIAS
> {code}
> {code:java}
> 90:LogicalAggregate(group=[{0}])
>   89:LogicalProject(CASE_ALIAS=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), 
> AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
>     88:LogicalJoin(condition=[=($6, $7)], joinType=[left])
>       84:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], 
> Column_3=[$3], $f0=[$4], $f1=[$5], Column_30=[$3])
>         83:LogicalJoin(condition=[true], joinType=[inner])
>           79:TableScan(...)
>           82:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>             81:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>               80:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
>       87:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>         86:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>           85:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> {code}
> Not using the group-by value in the projection:
> {code:java}
> select
>     count(*)
> from T1000
> group by
>     case
>         when Column_3 in (1, 2, 3, 4, 5) THEN 1
>         else 0
>     end
> {code}
> {code:java}
> 116:LogicalProject(EXPR$0=[$1])
>   115:LogicalAggregate(group=[{0}], EXPR$0=[COUNT()])
>     114:LogicalProject($f0=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), 
> AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
>       113:LogicalJoin(condition=[=($6, $7)], joinType=[left])
>         109:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], 
> Column_3=[$3], $f0=[$4], $f1=[$5], Column_30=[$3])
>           108:LogicalJoin(condition=[true], joinType=[inner])
>             104:TableScan(...)
>             107:LogicalAggregate(group=[{}], agg#0=[COUNT()], 
> agg#1=[COUNT($0)])
>               106:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>                 105:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 
> }]])
>         112:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>           111:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>             110:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to