[jira] [Comment Edited] (CALCITE-4887) Filter subQuery remove not fully optimized

bingfeng.guo (Jira) Thu, 18 Nov 2021 01:15:06 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445754#comment-17445754
 ]


bingfeng.guo edited comment on CALCITE-4887 at 11/18/21, 9:14 AM:
------------------------------------------------------------------

https://issues.apache.org/jira/browse/CALCITE-2071  I know from this issue.

The extra part in the plan is to prevent incorrect results, such as null 
values. However, the currently generated plan has an no condition inner join, 
which is equivalent to a cross join, which has a great impact on performance.

 

The following is an example in my UT:

sql:
{quote}select count(1)
from struct.t t1 left join emp on t1.c0 = emp.deptno
 where c0 = 1 or c0 in (
select deptno from emp where deptno = 1)
{quote}
 

expected:
{quote}LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
  LogicalProject($f0=[1])
    LogicalFilter(condition=[OR(=($4, 1), true)])
      LogicalJoin(condition=[=($4, $18)], joinType=[left])
        LogicalJoin(condition=[=($4, $16)], joinType=[left])
          LogicalTableScan(table=[[CATALOG, STRUCT, T]])
          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        LogicalAggregate(group=[
Unknown macro: \{0}
])
          LogicalProject(DEPTNO=[$7])
            LogicalFilter(condition=[=($7, 1)])
              LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{quote}
 

actual:
{quote}LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
  LogicalProject($f0=[1])
    LogicalFilter(condition=[OR(=($4, 1), CAST(CASE(=($18, 0), false, IS NOT 
NULL($22), true, IS NULL($20), null:BOOLEAN, <($19, $18), null:BOOLEAN, 
false)):BOOLEAN NOT NULL)])
      LogicalJoin(condition=[=($20, $21)], joinType=[left])
        LogicalProject("K0"=[$0], "C1"=[$1], "F1"."A0"=[$2], "F2"."A0"=[$3], 
"F0"."C0"=[$4], "F1"."C0"=[$5], "F0"."C1"=[$6], "F1"."C2"=[$7], "F2"."C3"=[$8], 
EMPNO=[$9], ENAME=[$10], JOB=[$11], MGR=[$12], HIREDATE=[$13], SAL=[$14], 
COMM=[$15], DEPTNO=[$16], SLACKER=[$17], $f0=[$18], $f1=[$19], "F0"."C0"0=[$4])
          LogicalJoin(condition=[true], joinType=[inner])
            LogicalJoin(condition=[=($4, $16)], joinType=[left])
              LogicalTableScan(table=[[CATALOG, STRUCT, T]])
              LogicalTableScan(table=[[CATALOG, SALES, EMP]])
            LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
              LogicalProject(DEPTNO=[$7], $f1=[true])
                LogicalFilter(condition=[=($7, 1)])
                  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        LogicalAggregate(group=[
Unknown macro: \{0}
], agg#0=[MIN($1)])
          LogicalProject(DEPTNO=[$7], $f1=[true])
            LogicalFilter(condition=[=($7, 1)])
              LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{quote}


was (Author: bingfeng):
https://issues.apache.org/jira/browse/CALCITE-2071  I know from this issue.

The extra part in the plan is to prevent incorrect results, such as null 
values. However, the currently generated plan has an unconditional inner join, 
which is equivalent to a cross join, which has a great impact on performance.

 

The following is an example in my UT:

sql:
{quote}select count(1)
from struct.t t1 left join emp on t1.c0 = emp.deptno
 where c0 = 1 or c0 in (
select deptno from emp where deptno = 1)
{quote}
 

expected:
{quote}LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
  LogicalProject($f0=[1])
    LogicalFilter(condition=[OR(=($4, 1), true)])
      LogicalJoin(condition=[=($4, $18)], joinType=[left])
        LogicalJoin(condition=[=($4, $16)], joinType=[left])
          LogicalTableScan(table=[[CATALOG, STRUCT, T]])
          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        LogicalAggregate(group=[\\{0}])
          LogicalProject(DEPTNO=[$7])
            LogicalFilter(condition=[=($7, 1)])
              LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{quote}
 

actual:
{quote}LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
  LogicalProject($f0=[1])
    LogicalFilter(condition=[OR(=($4, 1), CAST(CASE(=($18, 0), false, IS NOT 
NULL($22), true, IS NULL($20), null:BOOLEAN, <($19, $18), null:BOOLEAN, 
false)):BOOLEAN NOT NULL)])
      LogicalJoin(condition=[=($20, $21)], joinType=[left])
        LogicalProject("K0"=[$0], "C1"=[$1], "F1"."A0"=[$2], "F2"."A0"=[$3], 
"F0"."C0"=[$4], "F1"."C0"=[$5], "F0"."C1"=[$6], "F1"."C2"=[$7], "F2"."C3"=[$8], 
EMPNO=[$9], ENAME=[$10], JOB=[$11], MGR=[$12], HIREDATE=[$13], SAL=[$14], 
COMM=[$15], DEPTNO=[$16], SLACKER=[$17], $f0=[$18], $f1=[$19], "F0"."C0"0=[$4])
          LogicalJoin(condition=[true], joinType=[inner])
            LogicalJoin(condition=[=($4, $16)], joinType=[left])
              LogicalTableScan(table=[[CATALOG, STRUCT, T]])
              LogicalTableScan(table=[[CATALOG, SALES, EMP]])
            LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
              LogicalProject(DEPTNO=[$7], $f1=[true])
                LogicalFilter(condition=[=($7, 1)])
                  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        LogicalAggregate(group=[\\{0}], agg#0=[MIN($1)])
          LogicalProject(DEPTNO=[$7], $f1=[true])
            LogicalFilter(condition=[=($7, 1)])
              LogicalTableScan(table=[[CATALOG, SALES, EMP]])
{quote}

> Filter subQuery remove not fully optimized
> ------------------------------------------
>
>                 Key: CALCITE-4887
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4887
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: bingfeng.guo
>            Assignee: bingfeng.guo
>            Priority: Major
>
> for example:
> {quote}SELECT 
>   count(1) 
> FROM 
>   SSB.P_LINEORDER as P_LINEORDER 
>   LEFT JOIN SSB.PART as PART ON P_LINEORDER.LO_PARTKEY = PART.P_PARTKEY 
>   LEFT JOIN SSB.DATES as DATES ON P_LINEORDER.LO_ORDERDATE = DATES.D_DATEKEY 
>   LEFT JOIN SSB.CUSTOMER as CUSTOMER ON P_LINEORDER.LO_CUSTKEY = 
> CUSTOMER.C_CUSTKEY 
>   LEFT JOIN SSB.SUPPLIER as SUPPLIER ON P_LINEORDER.LO_SUPPKEY = 
> SUPPLIER.S_SUPPKEY 
> WHERE 
>   P_LINEORDER.LO_ORDERDATE = '1992-01-01' 
>   or (
>     P_LINEORDER.LO_ORDERDATE in (
>       select 
>         D_DATEKEY 
>       from 
>         SSB.DATES 
>       where 
>         DATES.D_DATEKEY = '1992-01-01'
>     )
>   ) 
> LIMIT 
>   500
> {quote}
>  
> plan is
>  
> {quote}LimitRel(ctx=[], fetch=[500])
>     AggregateRel(group-set=[[]], groups=[null], EXPR$0=[COUNT()])
>       ProjectRel(ctx=[])
>         FilterRel(condition=[OR(=($5, 1992-01-01), CAST(AND(IS NOT NULL($65), 
> <>($62, 0))):BOOLEAN)])
>           JoinRel(condition=[=($5, $64)], joinType=[left])
>             JoinRel(condition=[true], joinType=[inner])
>               JoinRel(condition=[=($4, $55)], joinType=[left])
>                 JoinRel(condition=[=($2, $47)], joinType=[left])
>                   JoinRel(condition=[=($5, $30)], joinType=[left])
>                     JoinRel(condition=[=($3, $21)], joinType=[left])
>                       TableScan(table=[[SSB, P_LINEORDER]], fields=[[0, 1, 2, 
> 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])
>                       TableScan(table=[[SSB, PART]], fields=[[0, 1, 2, 3, 4, 
> 5, 6, 7, 8]])
>                     TableScan(table=[[SSB, DATES]], fields=[[0, 1, 2, 3, 4, 
> 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]])
>                   TableScan(table=[[SSB, CUSTOMER]], fields=[[0, 1, 2, 3, 4, 
> 5, 6, 7]])
>                 TableScan(table=[[SSB, SUPPLIER]], fields=[[0, 1, 2, 3, 4, 5, 
> 6]])
>               AggregateRel(group-set=[[]], groups=[null], c=[COUNT()], 
> ck=[COUNT($0)])
>                 ProjectRel(D_DATEKEY=[$0])
>                   FilterRel(condition=[=($0, 1992-01-01)])
>                     TableScan(table=[[SSB, DATES]], fields=[[0, 1, 2, 3, 4, 
> 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]])
>             AggregateRel(group-set=[[0, 1]], groups=[null])
>               ProjectRel(D_DATEKEY=[$0], i=[true])
>                 FilterRel(condition=[=($0, 1992-01-01)])
>                   TableScan(table=[[SSB, DATES]], fields=[[0, 1, 2, 3, 4, 5, 
> 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]])
> {quote}
>  
>  
> But the optimal plan should be
> {quote} 
> LimitRel(fetch=[500])
>     AggregateRel(group-set=[[]], groups=[null], EXPR$0=[COUNT()])
>       ProjectRel(ctx=[])
>         FilterRel(condition=[=($5, 1992-01-01)])
>           JoinRel(condition=[=($5, $62)], joinType=[inner])
>             JoinRel(condition=[=($4, $55)], joinType=[left])
>               JoinRel(condition=[=($2, $47)], joinType=[left])
>                 JoinRel(condition=[=($5, $30)], joinType=[left])
>                   JoinRel(condition=[=($3, $21)], joinType=[left])
>                     TableScan(table=[[SSB, P_LINEORDER]], fields=[[0, 1, 2, 
> 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])
>                     TableScan(table=[[SSB, PART]], fields=[[0, 1, 2, 3, 4, 5, 
> 6, 7, 8]])
>                   TableScan(table=[[SSB, DATES]], fields=[[0, 1, 2, 3, 4, 5, 
> 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]])
>                 TableScan(table=[[SSB, CUSTOMER]], fields=[[0, 1, 2, 3, 4, 5, 
> 6, 7]])
>               TableScan(table=[[SSB, SUPPLIER]], fields=[[0, 1, 2, 3, 4, 5, 
> 6]])
>             AggregateRel(group-set=[[0]], groups=[null])
>               ProjectRel(D_DATEKEY=[$0])
>                 FilterRel(condition=[=($0, 1992-01-01)])
>                   TableScan(table=[[SSB, DATES]], fields=[[0, 1, 2, 3, 4, 5, 
> 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]])
>  
> {quote}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (CALCITE-4887) Filter subQuery remove not fully optimized

Reply via email to