Stamatis Zampetakis created CALCITE-7446:
--------------------------------------------
Summary: Add rule to pushdown conditions from filtered aggregates
Key: CALCITE-7446
URL: https://issues.apache.org/jira/browse/CALCITE-7446
Project: Calcite
Issue Type: Improvement
Components: core
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
Add a new rule, AggregateExtractFilterRule, to extract and pushdown conditions
from filtered aggregates.
In SQL syntax, the transformation on a very simple query is outlined below.
+Before+
{code:sql}
SELECT
SUM(sal) FILTER (WHERE deptno = 10),
SUM(sal) FILTER (WHERE deptno = 20),
FROM emp
{code}
+After+
{code:sql}
SELECT
SUM(sal) FILTER (WHERE deptno = 10),
SUM(sal) FILTER (WHERE deptno = 20),
FROM emp
WHERE deptno = 10 OR deptno = 20
{code}
For the transformation to be valid all aggregate functions must have a filter
condition.
The initial version of the rule will only deal with grand totals (no GROUP BY).
At first the additional filter in the WHERE clause may appear somewhat
redundant but in practice it can help to greatly reduce the rows that reach the
aggregation functions especially if the table is large. In combination with
other rules/techniques (such as index selection, partition pruning, filter
pushdown) it can make a big different in performance.
Below a slightly more complex example showcasing how the new rule combined with
the existing JOIN pushdown rules can lead to a better plan.
{code:sql}
SELECT
SUM(e.sal) FILTER (WHERE d.name = 'SALES'),
SUM(e.sal) FILTER (WHERE d.name = 'RESEARCH')
FROM emp e JOIN dept d ON e.deptno = d.deptno
{code}
Currently, the aggregate processes all rows that come from the join of {{emp}}
and {{dept}} tables. However, the aggregate will throw away all rows that don't
involve the SALES and RESEARCH department so we can pushdown these conditions
below the join and just above the scan of the {{dept}} table.
+Before+
{noformat}
LogicalAggregate(group=[{}], EXPR$0=[SUM($0) FILTER $1], EXPR$1=[SUM($0) FILTER
$2])
LogicalProject(SAL=[$5], $f1=[=($10, 'SALES')], $f2=[=($10, 'RESEARCH')])
LogicalJoin(condition=[=($7, $9)], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{noformat}
+After+
{noformat}
LogicalAggregate(group=[{}], EXPR$0=[SUM($0) FILTER $1], EXPR$1=[SUM($0) FILTER
$2])
LogicalProject(SAL=[$5], $f1=[=($10, 'SALES')], $f2=[=($10, 'RESEARCH')])
LogicalJoin(condition=[=($7, $9)], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalFilter(condition=[SEARCH($1, Sarg['RESEARCH':VARCHAR(10),
'SALES':VARCHAR(10)]:VARCHAR(10))])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{noformat}
The JOIN example above is purely for demonstration purposes. The new rule is
not gonna have any specific logic for handling join or other operators. The
pushdown of the filter below the join (if possible) will be handled by the
existing rules.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)