Hi Julian, Thanks a lot for your feedback. I think SqlAggFunction.getDistinctOptionality() is exactly what I am looking for.
BTW, I think ANY_VALUE and SINGLE_VALUE also belong to the category of duplicate insensitive functions. What do you think? Best, Liya Fan On Tue, Oct 13, 2020 at 4:55 PM Julian Hyde <[email protected]> wrote: > We already have this concept. See SqlAggFunction.getDistinctOptionality(), > added in https://issues.apache.org/jira/browse/CALCITE-3159 < > https://issues.apache.org/jira/browse/CALCITE-3159>. > > Julian > > > > On Oct 13, 2020, at 12:54 AM, Fan Liya <[email protected]> wrote: > > > > Hi all, > > > > I would like to introduce the idea of duplicate insensitive aggregate > > functions. > > > > For such functions, the aggregation results remain the same even after > > deduplication. > > > > For example, given a sequence of data {1, 1, 2, 2, 3, 5, 5}, the > > aggregation results of MIN are the same regardless of whether we perform > > data deduplication first. That is, > > > > MIN({1, 1, 2, 2, 3, 5, 5}) = MIN({1, 2, 3, 5}) > > > > So MIN is a *deduplicate insensitive function*. > > > > On the other hand, function SUM is not duplicate insensitive, because > > > > SUM({1, 1, 2, 2, 3, 5, 5}) != SUM({1, 2, 3, 5}) > > > > The concept of deduplicate insensitiveness can help us in many > optimization > > scenarios. > > > > For example, the curent implementation of AggregateMergeRule rules out > any > > aggregate calls for which the isDistict() method returns true. However, > for > > duplicate insensitive functions, the rule should be applicable. > > > > Could you please give your valuable feedback? > > > > Best, > > Liya Fan > >
