[ 
https://issues.apache.org/jira/browse/CALCITE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280861#comment-14280861
 ] 

Jinfeng Ni edited comment on CALCITE-559 at 1/16/15 9:36 PM:
-------------------------------------------------------------

Maybe I miss something, but I just do not understand why after de-correlation, 
the table inside the correlated subquery will be joined with both the two 
tables in the outer query block. Essentially, the query looks like

{code}
    E  join D  
    where   E.id = D.id and E.col2 < ( Select avg(E2.col2) from E2 where 
E1.col1 = E2.col1); 
{code}

The plan after de-correlation will have two appearances  for E join D. The 
second is the one pushed into subQ and joined with E2. In other words, if we 
have n tables join in the outer query block, are we duplicating the n tables 
join in the de-correlated plan?  Performance-wise, is it optimal to duplicate 
those n table joins in the subquery?  Is it good enough to just push the 
correlated table (E) into the subq, do the join?
   
{code}
        LogicalJoin(condition=[true], joinType=[inner])
          EnumerableTableScan(table=[[foodmart2, employee]])
          EnumerableTableScan(table=[[foodmart2, department]])
        LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)])
          LogicalProject($f0=[$1], salary=[$0])
            LogicalProject(salary=[$11], $f0=[$17])
              LogicalFilter(condition=[=($6, $17)])
                LogicalJoin(condition=[true], joinType=[inner])
                  EnumerableTableScan(table=[[foodmart2, employee]])
                  LogicalAggregate(group=[{0}])
                    LogicalProject($f0=[$6])
                      LogicalJoin(condition=[true], joinType=[inner])
                        EnumerableTableScan(table=[[foodmart2, employee]])
                        EnumerableTableScan(table=[[foodmart2, department]])
{code}

Is it possible for Calcite's decorrelator to rewrite the query into  (first 
join E with E2, then join against D) ? 

{code}
   Select ..
    From (    
        S E.id, ...
        From   E, (select E2.col1, avg(E2.col2) 
                        From E2 Group by E2.col1) X (Col1, avgval)
        where E.col1 = X.col1 and E.col2 < X.avgval
        ) Y,
       D
    Where Y.id = D.id;
{code}


was (Author: jni):
Maybe I miss something, but I just do not understand why after de-correlation, 
the table inside the correlated subquery will be joined with both the two 
tables in the outer query block. Essentially, the query looks like

{code}
    E  join D  
    where   E.id = D.id and E.col2 < ( Select avg(E.col2) from E2 where E1.col1 
= E2.col1); 
{code}

The plan after de-correlation will have two appearances  for E join D. The 
second is the one pushed into subQ and joined with E2. In other words, if we 
have n tables join in the outer query block, are we duplicating the n tables 
join in the de-correlated plan?  Performance-wise, is it optimal to duplicate 
those n table joins in the subquery?  Is it good enough to just push the 
correlated table (E) into the subq, do the join?
   
{code}
        LogicalJoin(condition=[true], joinType=[inner])
          EnumerableTableScan(table=[[foodmart2, employee]])
          EnumerableTableScan(table=[[foodmart2, department]])
        LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)])
          LogicalProject($f0=[$1], salary=[$0])
            LogicalProject(salary=[$11], $f0=[$17])
              LogicalFilter(condition=[=($6, $17)])
                LogicalJoin(condition=[true], joinType=[inner])
                  EnumerableTableScan(table=[[foodmart2, employee]])
                  LogicalAggregate(group=[{0}])
                    LogicalProject($f0=[$6])
                      LogicalJoin(condition=[true], joinType=[inner])
                        EnumerableTableScan(table=[[foodmart2, employee]])
                        EnumerableTableScan(table=[[foodmart2, department]])
{code}

Is it possible for Calcite's decorrelator to rewrite the query into  (first 
join E with E2, then join against D) ? 

{code}
   Select ..
    From (    
        S E.id, ...
        From   E, (select E2.col1, avg(E2.col2) 
                        From E2 Group by E2.col1) X (Col1, avgval)
        where E.col1 = X.col1 and E.col2 < X.avgval
        ) Y,
       D
    Where Y.id = D.id;
{code}

> Correlated subquery will hit exception in Calcite
> -------------------------------------------------
>
>                 Key: CALCITE-559
>                 URL: https://issues.apache.org/jira/browse/CALCITE-559
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Julian Hyde
>
> Calcite will throw exception, for a query with a correlated subquery.   I 
> tried both sqlline and as a Junit test case inJDBCTest. Both of them hit 
> exception:
> 1. sqlline
> {code}
> sqlline> !connect 
> jdbc:calcite:model=core/src/test/resources/hsqldb-foodmart-lattice-model.json 
> admin admin
> 0: jdbc:calcite:model=core/src/test/resources> select e."employee_id", 
> d."department_id" from "foodmart"."employee" e, "foodmart"."department" d 
> where e."department_id" = d."department_id" and e."salary" > (select 
> avg(e2."salary") from "foodmart"."employee" e2 where e2."store_id" = 
> e."store_id");
> Error: exception while executing query: while executing SQL [SELECT "$f0"
> FROM (SELECT "employee"."store_id" AS "$f0"
> FROM "foodmart"."department"
> INNER JOIN "foodmart"."employee") AS "t"
> GROUP BY "$f0"] (state=,code=0)
> {code}
> 2. As a Junit testcase:
> {code}
>   @Test public void testJoinCorreScalarSubQ()
>       throws ClassNotFoundException, SQLException {
>     CalciteAssert.that()
>         .with(CalciteAssert.Config.FOODMART_CLONE)
>         .query("select e.\"employee_id\", d.\"department_id\" "
>             + " from \"employee\" e, \"department\" d "
>             + " where e.\"department_id\" = d.\"department_id\" and "
>             + "       e.\"salary\" > (select avg(e2.\"salary\") "
>             + "                       from \"employee\" e2 "
>             + " where e2.\"store_id\" = e.\"store_id\")"
>         ).returnsCount(0);
>   }
> {code}
> Caused by: java.lang.RuntimeException: With materializationsEnabled=false, 
> limit=0
>       at 
> org.apache.calcite.test.CalciteAssert.assertQuery(CalciteAssert.java:461)
>       at 
> org.apache.calcite.test.CalciteAssert$AssertQuery.returns(CalciteAssert.java:1159)
>       ... 29 more
> Caused by: java.lang.AssertionError: Internal error: Error while applying 
> rule EnumerableJoinRule, args 
> [rel#160:LogicalJoin.NONE.[](left=rel#75:Subset#0.ENUMERABLE.[],right=rel#93:Subset#9.NONE.[],condition=AND(=($6,
>  $17), >($11, $18)),joinType=inner)]
>       at org.apache.calcite.util.Util.newInternal(Util.java:739)
>       at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:251)
>       at ...
>       ... 30 more
> Caused by: java.lang.AssertionError: Internal error: Error occurred while 
> applying rule EnumerableJoinRule
>       at org.apache.calcite.util.Util.newInternal(Util.java:739)
>       at      ... 44 more
> Caused by: java.lang.AssertionError: type mismatch:
> ref:
> DECIMAL(10, 4)
> input:
> DECIMAL(10, 4) NOT NULL
>       at org.apache.calcite.plan.RelOptUtil.eq(RelOptUtil.java:1608)
>       at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:120)
>       at .....
> {code}
> The sqlline run seems to hit exception during enumerable execution, while the 
> Junit run seems to hit exception in planning phase.  Probably, that's because 
> sqlline by default does not turn on assertion check. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to