It’s possible that the rewrite rule is a bit over-cautious. It is aggregating something that it could know has only one row.
It’s worth thinking over what would happen for left, right and full joins. They challenge the “exactly one” assumption. And if you’re generalizing "group by k” to “group by k1, k2, … kn”, that’s fine, but give extra thought to n = 0 (i.e. group by ()). It produces one row even if the input is zero. I strongly recommend that you write tests that produce data, and run those tests on both calcite and another DB. Did you see https://issues.apache.org/jira/browse/CALCITE-2108? <https://issues.apache.org/jira/browse/CALCITE-2108?> It is related. Julian > On Mar 29, 2018, at 5:40 AM, Alessandro Solimando > <[email protected]> wrote: > > Hi Haisgheng, > I agree with you, since the groupby on t1.b, by definition, produces one > tuple per distinct t1.b value, for any given t1.b distinct values you have > at most one tuple both sides, so the direct multiplication cnt*c suffice as > SUM0 sums over a single element. > > But I don't know the logic behind the rule, so I cannot say if the > rewriting is there for other reasons (triggering other possible chains of > rewriting for instance), but definitely an interesting remark :) > > Best regards, > Alessandro > > On 29 March 2018 at 13:54, Haisheng Yuan <[email protected]> wrote: > >> Hi devs, >> >> While reading SqlSplittableAggFunction and AggregateJoinTransposeRule >> source >> code, I noticed that in CountSplitter.topSplit, it creates an SUM0 AggCall, >> which makes sense for the following query: >> select count(t1.a) from t1 join t2 on t1.b=t2.b; >> >> which will be transformed to: >> select sum0(cnt*c) from (select b, count(a) as cnt from t1 group by b) t1 >> join (select b, count(*) as c from t2 group by b)t2 on t1.b=t2.b ; >> >> But for another query: >> select t1.b, count(t1.a) from t1 join t2 on t1.b=t2.b group by t1.b; >> >> which will be transformed to: >> select t1.b, sum0(cnt*c) from (select b, count(a) as cnt from t1 group by >> b) t1 join (select b, count(*) as c from t2 group by b)t2 on t1.b=t2.b >> group by t1.b; >> >> In fact, query without additional SUM0 group agg should be enough: >> select t1.b, cnt*c from (select b, count(a) as cnt from t1 group by b) t1 >> join (select b, count(*) as c from t2 group by b) t2 on t1.b=t2.b; >> >> Am I missing something? Correct me if I am wrong. >> >> Thanks~ >> Haisheng Yuan >>
