Stamatis Zampetakis created CALCITE-7604:
--------------------------------------------
Summary: Add rule to pull up GROUP BY above JOIN
Key: CALCITE-7604
URL: https://issues.apache.org/jira/browse/CALCITE-7604
Project: Calcite
Issue Type: New Feature
Components: core
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
Implement a new rule to pull up a GROUP BY above a JOIN when possible. The
major benefit of the group-by pull up transformation is that the join may
reduce the number of input rows to the group-by, if the join is selective.
The idea of pulling the aggregation above a join is rather old and the
following research papers are among the first that describe the individual
transformations to a greater extend:
# Yan & Larson (1995) "Interchanging the order of grouping and join" Technical
ReportĀ
# Yan & Larson (1995) "Eager Aggregation and Lazy Aggregation", VLDB
Below a simple query demonstrating the group by pull up transformation using
SQL syntax.
+Before+
{code:sql}
SELECT s.sales
FROM (SELECT ss_sold_date_sk, SUM(ss_sales_price) AS sales
FROM store_sales
GROUP BY ss_sold_date_sk) s
JOIN date_dim d
ON s.ss_sold_date_sk = d.d_date_sk
WHERE d.d_year = 2000;
{code}
+After+
{code:sql}
SELECT SUM(ss_sales_price) AS sales
FROM store_sales s
JOIN date_dim d
ON s.ss_sold_date_sk = d.d_date_sk
WHERE d.d_year = 2000
GROUP BY s.ss_sold_date_sk;
{code}
More examples can be found in the respective papers.
The first version of the rule aims to cover the simplest form of group-by pull
up described under "Interchanging the order of grouping and join". The more
advanced "lazy" variants can be implemented in follow-ups.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)