Perhaps it would be helpful to look at how Clickhouse's aggregate functions are implemented?
https://github.com/ClickHouse/ClickHouse/tree/master/src/AggregateFunctions You're welcome to propose improvements to the kernel interface to accommodate more complex aggregate functions On Wed, Sep 16, 2020 at 5:27 AM Jorge Cardoso Leitão <jorgecarlei...@gmail.com> wrote: > > Hi Yibo, > > That is correct. The simplest example is an average of 3 elements > {x1,x2,x3}, in two chunks: {x1} and {x2,x3}. The average of the average is > not equal to the average: > > avg({avg({x1}), avg({x2,x3})}) = ((x1) + (x2 + x3)/2)/2 != (x1 + x2 + x3) / > 3 = avg({x1,x2,x3}) > > We are solving this in DataFusion (Rust): > > Issue: https://issues.apache.org/jira/browse/ARROW-9937 > Proposal: > https://docs.google.com/document/d/1n-GS103ih3QIeQMbf_zyDStjUmryRQd45ypgk884LHU/edit > PR https://github.com/apache/arrow/pull/8172, > > Essentially, what you said, there needs to be two operations, `update` and > `merge`, which are not the same. > There are many other examples, and that is also why e.g. spark's UDAF's > interface requires `update` and `merge` > > > > > On Wed, Sep 16, 2020 at 12:12 PM Yibo Cai <yibo....@arm.com> wrote: > > > Hi, > > > > I have a question about aggregate kernel implementation. Any help is > > appreciated. > > > > Aggregate kernel implements "consume" and "merge" interfaces. For a > > chunked array, "consume" is called for each array to get a temporary > > aggregated result, then "merge" it with previously consumed result. For > > associative operations like min/max/sum, this pattern is convenient. We can > > easily "merge" min/max/sum of two arrays, e.g, sum([array_a, array_b]) = > > sum(array_a) + sum(array_b). > > > > But I wonder what's the best approach to deal with operations like > > stdev/percentile. Results of these operations cannot be easily "merged". We > > have to walk through all the chunks to get the result. For these > > operations, looks "consume" must copy the input array and do all > > calculation once at "finalize" time. Or we don't expect it to support > > chunked array for them. > > > > Yibo > >