Perhaps it would be helpful to look at how Clickhouse's aggregate
functions are implemented?

https://github.com/ClickHouse/ClickHouse/tree/master/src/AggregateFunctions

You're welcome to propose improvements to the kernel interface to
accommodate more complex aggregate functions

On Wed, Sep 16, 2020 at 5:27 AM Jorge Cardoso Leitão
<jorgecarlei...@gmail.com> wrote:
>
> Hi Yibo,
>
> That is correct. The simplest example is an average of 3 elements
> {x1,x2,x3}, in two chunks: {x1} and {x2,x3}. The average of the average is
> not equal to the average:
>
> avg({avg({x1}), avg({x2,x3})}) = ((x1) + (x2 + x3)/2)/2 != (x1 + x2 + x3) /
> 3 = avg({x1,x2,x3})
>
> We are solving this in DataFusion (Rust):
>
> Issue: https://issues.apache.org/jira/browse/ARROW-9937
> Proposal:
> https://docs.google.com/document/d/1n-GS103ih3QIeQMbf_zyDStjUmryRQd45ypgk884LHU/edit
> PR https://github.com/apache/arrow/pull/8172,
>
> Essentially, what you said, there needs to be two operations, `update` and
> `merge`, which are not the same.
> There are many other examples, and that is also why e.g. spark's UDAF's
> interface requires `update` and `merge`
>
>
>
>
> On Wed, Sep 16, 2020 at 12:12 PM Yibo Cai <yibo....@arm.com> wrote:
>
> > Hi,
> >
> > I have a question about aggregate kernel implementation. Any help is
> > appreciated.
> >
> > Aggregate kernel implements "consume" and "merge" interfaces. For a
> > chunked array, "consume" is called for each array to get a temporary
> > aggregated result, then "merge" it with previously consumed result. For
> > associative operations like min/max/sum, this pattern is convenient. We can
> > easily "merge" min/max/sum of two arrays, e.g, sum([array_a, array_b]) =
> > sum(array_a) + sum(array_b).
> >
> > But I wonder what's the best approach to deal with operations like
> > stdev/percentile. Results of these operations cannot be easily "merged". We
> > have to walk through all the chunks to get the result. For these
> > operations, looks "consume" must copy the input array and do all
> > calculation once at "finalize" time. Or we don't expect it to support
> > chunked array for them.
> >
> > Yibo
> >

Reply via email to