I agree, the aggregations were a quick shot and should be reworked. They
are a bit inspired by the SQL style aggregations, so MIN and MAX give the
minimum and maximum value of the column.

MinBy and MaxBy are not aggregations, they are rather "selectors", which
grab the tuple with that characteristic. At least in SQL terms...



On Sat, Sep 6, 2014 at 1:01 PM, Fabian Hueske <[email protected]> wrote:

> I don't like the semantics of the current aggregation operator either.
> I'd be happy to discuss whether and how we should change it.
>
> Some time ago, I sketched an alternative in the old Stratosphere-Github wik
> which might be a good starting point for a discussion:
>
>
> https://github.com/stratosphere/stratosphere/wiki/Design-of-Aggregate-Operator
>
> Cheers, Fabian
>
>
>
> 2014-09-06 12:01 GMT+02:00 Ufuk Celebi <[email protected]>:
>
> > On Fri, Sep 5, 2014 at 10:30 PM, Gyula Fóra <[email protected]>
> wrote:
> >
> > > For the sum aggregation this makes sense, but shouldn't min and max
> > > actually return an element of the dataset?
> > >
> >
> > There are also the minBy and maxBy methods, which return the Tuple with
> the
> > minimum/maximum value whereas the min and max methods just work on the
> > field.
> >
> > I also have the feeling that this might be unintuitive and that users
> would
> > expect minBy/maxBy semantics to be the default.
> >
>

Reply via email to