[jira] [Commented] (CALCITE-2202) Aggregate Join Push-down on a Single Side

Zhong Yu (JIRA) Sun, 25 Mar 2018 09:42:24 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413095#comment-16413095
 ]


Zhong Yu commented on CALCITE-2202:
-----------------------------------

Sorry about the delay, but I've finished with a better proof (I hope) - 
[http://zhong-j-yu.github.io/aggregate-join-pushdown-3.pdf] - which extends 
pushdown to all join types and join conditions.

Forget about this Jira issue, which is probably not important, and we may  
close it. On more important points --

Pushdown works on any theta condition. The focus on equality condition is a 
distraction; equality across columns is very complicated and does not help to 
simplify reasoning. I do not invoke equality of values anywhere in my proof; I 
only invoke not-distinct-ness of values in the same type.

Pushdown works on any theta condition because we keep key columns, therefore 
rows that associate with each other before still associate after, regardless of 
matching conditions.

Pushdown should also work on semi join and outer join, quite intuitively. 
However, in outer join, we need a property that aggregate on null-row returns 
null-row, which seems invalid for count(). But, we can tweak split functions to 
make it valid. For example, we can treat a null result from count() as 1 (it 
can be null if it comes from the right-side in a left-outer join). The generic 
trick is simply to permute the result of aggregate. 

group by() on empty table does cause problems; the proof reveals exactly where. 
Example, "select count() from L cross join R group by R.g"; it ought to produce 
zero rows if L is empty; it's a bug to transform it to contain "select count() 
from L group by()". There's a workaround; or, we may simply skip over such 
pathological joins.

My proof is quite lengthy, which I'm not proud of. I wish I could reason about 
it in plain language, but the problem itself seems quite expansive, requiring 
very careful handling to cover all cases. I tried my best to structure it 
nicely. If you could, please find some unlucky guy to review my proof. Thanks!

> Aggregate Join Push-down on a Single Side
> -----------------------------------------
>
>                 Key: CALCITE-2202
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2202
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: next
>            Reporter: Zhong Yu
>            Assignee: Julian Hyde
>            Priority: Major
>             Fix For: next
>
>
> While investigating https://issues.apache.org/jira/browse/CALCITE-2195, it's 
> apparent that aggregation can be pushed on on a single side (either side), 
> and leave the other side non-aggregated, regardless of whether grouping 
> columns are unique on the other side. My analysis – 
> [http://zhong-j-yu.github.io/aggregate-join-push-down.pdf] .
> This may be useful when the metadata is insufficient; in any case, we may try 
> to provide all 3 possible transformations (aggregate on left only; right 
> only; both sides) to the cost based optimizer, so that the cheapest one can 
> be chosen based on stats. 
> Does this make any sense, anybody? If it sounds good, I'll implement it and 
> offer a PR. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2202) Aggregate Join Push-down on a Single Side

Reply via email to