Shiven Dvrk created CALCITE-4984:
------------------------------------

             Summary: RelNode decorrelations is generating wrong group keys
                 Key: CALCITE-4984
                 URL: https://issues.apache.org/jira/browse/CALCITE-4984
             Project: Calcite
          Issue Type: Bug
            Reporter: Shiven Dvrk


We have a relnode of this format:
{code:java}
LogicalAggregate(group=[{0, 2}], agg#0=[SUM($1)])
  LogicalProject(C501=[$3], $f1=[-($2, $4)], $f2=[1])
    LogicalCorrelate(correlation=[$cor0], joinType=[left], 
requiredColumns=[{0}])
      LogicalTableScan(table=[[T1]])
      LogicalAggregate(group=[{}], agg#0=[SUM($5)])
        LogicalJoin(condition=[=($1, $0)], joinType=[inner])
          LogicalProject(ID=[$0])
            LogicalTableScan(table=[[T2]])
          LogicalProject(ID=[$0], C508=[$1], C507=[$2], C501=[$3], C509=[$4])
            LogicalFilter(condition=[=($0, $cor0.ID)])
              LogicalTableScan(table=[[T3]]) {code}
Which represents a query 
{code:java}
SELECT
  t10.C501,
  1 as Y,
  SUM(t10.C505 - t8.test_sum) AS Z 
FROM
   T1 AS t10,
   LATERAL (
   SELECT
      SUM(t0.C509) AS test_sum 
   FROM
      (
         SELECT
            ID 
         FROM
            T2
      )
      AS t 
      INNER JOIN
         (
            SELECT
               * 
            FROM
               T3 
            WHERE
               C508 = t10.ID
         )
         AS t0 
         ON t.ID = t0.C507) AS t8
         GROUP BY t10.C501, 1 
{code}

When we use RelDecorrelator.decorrelateQuery(), it is decorrelating the relnode 
to following:


{code}
LogicalAggregate(group=[{0, 1}], Z=[SUM($1)])
  LogicalProject(C501=[$0], X=[-($1, $7)])
    LogicalJoin(condition=[=($5, $6)], joinType=[left])
      QueryTableScan(table=[[QU, T1]], fields=[[0, 1, 2, 3, 4, 5]])
      LogicalAggregate(group=[{0}], test_sum=[SUM($1)])
        LogicalProject(C508=[$4], C509=[$5])
          LogicalJoin(condition=[=($3, $0)], joinType=[inner])
            LogicalProject(ID=[$5])
              QueryTableScan(table=[[QUERY, T2]], fields=[[0, 1, 2, 3, 4, 5]])
            QueryTableScan(table=[[QUERY, T3]], fields=[[0, 1, 2, 3, 4]])
{code}

Representing the query like this:
{code}
            SELECT
               t10.C501,
               t10.C505 - t18.test_sum AS X,
               SUM(t10.C505 - t18.test_sum) AS Z
            FROM
               T1 AS t10
               LEFT JOIN
                  (
                     SELECT
                        T3.C508,
                        SUM(T3.C509) AS test_sum
                     FROM
                        (
                           SELECT
                              ID
                           FROM
                              T2
                        )
                        AS t
                        INNER JOIN
                           T3
                           ON t.ID = T3.C507
                     GROUP BY
                        T3.C508
                  )
                  AS t18
                  ON t10.ID = t18.C508
            GROUP BY
               t10.C501,
               t10.C5633_505 - t18.test_sum
{code}
 

The decorrelation logic is changing the group by keys when the literal is 
involved. any idea on the reason behind this behavior?




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to