Hi Julian,

Thanks a lot for your feedback.
I think SqlAggFunction.getDistinctOptionality() is exactly what I
am looking for.

BTW, I think ANY_VALUE and SINGLE_VALUE also belong to the category of
duplicate insensitive functions.
What do you think?

Best,
Liya Fan



On Tue, Oct 13, 2020 at 4:55 PM Julian Hyde <[email protected]> wrote:

> We already have this concept. See SqlAggFunction.getDistinctOptionality(),
> added in https://issues.apache.org/jira/browse/CALCITE-3159 <
> https://issues.apache.org/jira/browse/CALCITE-3159>.
>
> Julian
>
>
> > On Oct 13, 2020, at 12:54 AM, Fan Liya <[email protected]> wrote:
> >
> > Hi all,
> >
> > I would like to introduce the idea of duplicate insensitive aggregate
> > functions.
> >
> > For such functions, the aggregation results remain the same even after
> > deduplication.
> >
> > For example, given a sequence of data {1, 1, 2, 2, 3, 5, 5}, the
> > aggregation results of MIN are the same regardless of whether we perform
> > data deduplication first. That is,
> >
> > MIN({1, 1, 2, 2, 3, 5, 5}) = MIN({1, 2, 3, 5})
> >
> > So MIN is a *deduplicate insensitive function*.
> >
> > On the other hand, function SUM is not duplicate insensitive, because
> >
> > SUM({1, 1, 2, 2, 3, 5, 5}) != SUM({1, 2, 3, 5})
> >
> > The concept of deduplicate insensitiveness can help us in many
> optimization
> > scenarios.
> >
> > For example, the curent implementation of AggregateMergeRule rules out
> any
> > aggregate calls for which the isDistict() method returns true. However,
> for
> > duplicate insensitive functions, the rule should be applicable.
> >
> > Could you please give your valuable feedback?
> >
> > Best,
> > Liya Fan
>
>

Reply via email to