[
https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16392086#comment-16392086
]
Julian Hyde commented on CALCITE-2202:
--------------------------------------
Maybe I've been out of academia and in engineering for too long, but I prefer a
few well-chosen examples than a formal proof. Some examples stake out the
corner cases (e.g. empty group by) and other examples are easily generalized
(e.g. if something applies to MIN it applies to MAX and similar functions, and
if something that applies to AVG it applies to STDEV and similar functions).
Are you claiming that it is better to push down on only one side? (I could see
how it would be simpler, therefore better, if you could push down to the left
first, then push down to the right.) I contend that the desired result is
{code}select sum(e.s * d.c)
from (select deptno, sum(sal) as s from emp group by deptno) as e
join (select deptno, count(*) as c from dept group by deptno) as d
on e.deptno = d.deptno
group by e.deptno{code} (I mistakenly omitted the "group by deptno" in the two
inner queries last time) and I wonder whether you could get to that by applying
your methods.
Most aggregate functions we are familiar with have one argument, but would your
methods apply to aggregate functions with more than one? Let's consider COUNT
and COVAR_POP, both of which can take two arguments, and ask whether they can
be pushed down if they have one argument from each side of a join.
> Aggregate Join Push-down on a Single Side
> -----------------------------------------
>
> Key: CALCITE-2202
> URL: https://issues.apache.org/jira/browse/CALCITE-2202
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: next
> Reporter: Zhong Yu
> Assignee: Julian Hyde
> Priority: Major
> Fix For: next
>
>
> While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's
> apparent that aggregation can be pushed on on a single side (either side),
> and leave the other side non-aggregated, regardless of whether grouping
> columns are unique on the other side. My analysis –
> [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] .
> This may be useful when the metadata is insufficient; in any case, we may try
> to provide all 3 possible transformations (aggregate on left only; right
> only; both sides) to the cost based optimizer, so that the cheapest one can
> be chosen based on stats.
> Does this make any sense, anybody? If it sounds good, I'll implement it and
> offer a PR.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)