Stamatis Zampetakis created CALCITE-7604:
--------------------------------------------

             Summary: Add rule to pull up GROUP BY above JOIN
                 Key: CALCITE-7604
                 URL: https://issues.apache.org/jira/browse/CALCITE-7604
             Project: Calcite
          Issue Type: New Feature
          Components: core
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


Implement a new rule to pull up a GROUP BY above a JOIN when possible. The 
major benefit of the group-by pull up transformation is that the join may 
reduce the number of input rows to the group-by, if the join is selective.

The idea of pulling the aggregation above a join is rather old and the 
following research papers are among the first that describe the individual 
transformations to a greater extend:
 # Yan & Larson (1995) "Interchanging the order of grouping and join" Technical 
ReportĀ 
 # Yan & Larson (1995) "Eager Aggregation and Lazy Aggregation", VLDB

Below a simple query demonstrating the group by pull up transformation using 
SQL syntax.

+Before+
{code:sql}
SELECT s.sales
FROM (SELECT ss_sold_date_sk, SUM(ss_sales_price) AS sales
      FROM store_sales
      GROUP BY ss_sold_date_sk) s
JOIN date_dim d
  ON s.ss_sold_date_sk = d.d_date_sk
WHERE d.d_year = 2000;
{code}
+After+
{code:sql}
SELECT SUM(ss_sales_price) AS sales
FROM store_sales s
JOIN date_dim d
  ON s.ss_sold_date_sk = d.d_date_sk
WHERE d.d_year = 2000
GROUP BY s.ss_sold_date_sk;
{code}
More examples can be found in the respective papers.

The first version of the rule aims to cover the simplest form of group-by pull 
up described under "Interchanging the order of grouping and join". The more 
advanced "lazy" variants can be implemented in follow-ups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to