I don't like the semantics of the current aggregation operator either. I'd be happy to discuss whether and how we should change it.
Some time ago, I sketched an alternative in the old Stratosphere-Github wik which might be a good starting point for a discussion: https://github.com/stratosphere/stratosphere/wiki/Design-of-Aggregate-Operator Cheers, Fabian 2014-09-06 12:01 GMT+02:00 Ufuk Celebi <[email protected]>: > On Fri, Sep 5, 2014 at 10:30 PM, Gyula Fóra <[email protected]> wrote: > > > For the sum aggregation this makes sense, but shouldn't min and max > > actually return an element of the dataset? > > > > There are also the minBy and maxBy methods, which return the Tuple with the > minimum/maximum value whereas the min and max methods just work on the > field. > > I also have the feeling that this might be unintuitive and that users would > expect minBy/maxBy semantics to be the default. >
