[
https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413095#comment-16413095
]
Zhong Yu commented on CALCITE-2202:
-----------------------------------
Sorry about the delay, but I've finished with a better proof (I hope) -
[http://zhong-j-yu.github.io/aggregate-join-pushdown-3.pdf] - which extends
pushdown to all join types and join conditions.
Forget about this Jira issue, which is probably not important, and we may
close it. On more important points --
Pushdown works on any theta condition. The focus on equality condition is a
distraction; equality across columns is very complicated and does not help to
simplify reasoning. I do not invoke equality of values anywhere in my proof; I
only invoke not-distinct-ness of values in the same type.
Pushdown works on any theta condition because we keep key columns, therefore
rows that associate with each other before still associate after, regardless of
matching conditions.
Pushdown should also work on semi join and outer join, quite intuitively.
However, in outer join, we need a property that aggregate on null-row returns
null-row, which seems invalid for count(). But, we can tweak split functions to
make it valid. For example, we can treat a null result from count() as 1 (it
can be null if it comes from the right-side in a left-outer join). The generic
trick is simply to permute the result of aggregate.
group by() on empty table does cause problems; the proof reveals exactly where.
Example, "select count() from L cross join R group by R.g"; it ought to produce
zero rows if L is empty; it's a bug to transform it to contain "select count()
from L group by()". There's a workaround; or, we may simply skip over such
pathological joins.
My proof is quite lengthy, which I'm not proud of. I wish I could reason about
it in plain language, but the problem itself seems quite expansive, requiring
very careful handling to cover all cases. I tried my best to structure it
nicely. If you could, please find some unlucky guy to review my proof. Thanks!
> Aggregate Join Push-down on a Single Side
> -----------------------------------------
>
> Key: CALCITE-2202
> URL: https://issues.apache.org/jira/browse/CALCITE-2202
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: next
> Reporter: Zhong Yu
> Assignee: Julian Hyde
> Priority: Major
> Fix For: next
>
>
> While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's
> apparent that aggregation can be pushed on on a single side (either side),
> and leave the other side non-aggregated, regardless of whether grouping
> columns are unique on the other side. My analysis –
> [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] .
> This may be useful when the metadata is insufficient; in any case, we may try
> to provide all 3 possible transformations (aggregate on left only; right
> only; both sides) to the cost based optimizer, so that the cheapest one can
> be chosen based on stats.
> Does this make any sense, anybody? If it sounds good, I'll implement it and
> offer a PR.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)