Re: Hi / Aggregation support

Fabian Hueske Mon, 10 Nov 2014 03:01:03 -0800

How/where do you plan to define the methods min(1), max(1), and cnt()?
If these are static methods in some kind of Aggregation class, it won't
look so concise anymore, or am I missing something here?


I would be fine with both ways, the second one being nice, if it can be
done like that.

2014-11-10 11:03 GMT+01:00 Gyula Fora <gyf...@apache.org>:

> I also support this approach:
>
>  ds.groupBy(0).aggregate(min(1), max(1), cnt())
>
> I think it makes the code more readable, because it is easy to see whats
> in the result tuple.
>
> Gyula
>
> > On 10 Nov 2014, at 10:49, Aljoscha Krettek <aljos...@apache.org> wrote:
> >
> > I like this version: ds.groupBy(0).aggregate(min(1), max(1), cnt()),
> > very concise.
> >
> > On Mon, Nov 10, 2014 at 10:42 AM, Viktor Rosenfeld
> > <viktor.rosenf...@tu-berlin.de> wrote:
> >> Hi Fabian,
> >>
> >> I ran into a problem with your syntax example:
> >>
> >> DataSet<Tuple2&lt;String, Integer>> ds = ...
> >> DataSet<Tuple4&lt;Tuple2&lt;String,Integer>,Integer, Integer, Long>
> result =
> >> ds.groupBy(0).min(1).andMax(1).andCnt();
> >>
> >> Basically, in the example above we don't know how long the chain of
> >> aggregation method calls is. Also, each aggregation method call adds a
> >> field to the result tuple (the first call to groupBy returns a
> >> Tuple1). Because the resultType of an operator is specified in the
> >> constructur, every one of those method calls needs to create a new
> >> Operator<OUT> with the correct result type. However, only the
> >> translateToDataflow method of the last method call in the chain should
> >> actually compute the aggregation.
> >>
> >> This can be achieved by testing if an aggregation method is called on
> >> an AggregationOperator. The translateToDataFlow method of the
> >> operators in the start/middle of the chain would then just return a
> >> MapOperatorBase which simply extends the tuple. The
> >> translateToDataFlow method of the last operator in the chain would
> >> return a GroupReduceOperatorBase.
> >>
> >> This strategy seems very hackish and involves lots of unnecessary
> >> copying of tuple data. I think a better way would be to use the
> >> following syntax:
> >>
> >> ds.groupBy(0).aggregate(min(1), max(1), cnt())
> >>
> >> or
> >>
> >> ds.groupBy(0).min(1).max(1).cnt(1).aggregate()
> >>
> >> Here, there is only one method which creates a new operator, the
> >> aggregate method, and the final resultType is known when aggregate is
> >> called.
> >>
> >> What do you think?
> >>
> >> Best,
> >> Viktor
> >>
> >>
> >>
> >> --
> >> View this message in context:
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Hi-Aggregation-support-tp2311p2429.html
> >> Sent from the Apache Flink (Incubator) Mailing List archive. mailing
> list archive at Nabble.com.
>
>

Re: Hi / Aggregation support

Reply via email to