Re: Hi / Aggregation support

Stephan Ewen Mon, 10 Nov 2014 03:02:47 -0800

I guess you would need static method imports to make the code look like
this, which I think is fine.


On Mon, Nov 10, 2014 at 12:00 PM, Fabian Hueske <fhue...@apache.org> wrote:

> How/where do you plan to define the methods min(1), max(1), and cnt()?
> If these are static methods in some kind of Aggregation class, it won't
> look so concise anymore, or am I missing something here?
>
> I would be fine with both ways, the second one being nice, if it can be
> done like that.
>
> 2014-11-10 11:03 GMT+01:00 Gyula Fora <gyf...@apache.org>:
>
> > I also support this approach:
> >
> >  ds.groupBy(0).aggregate(min(1), max(1), cnt())
> >
> > I think it makes the code more readable, because it is easy to see whats
> > in the result tuple.
> >
> > Gyula
> >
> > > On 10 Nov 2014, at 10:49, Aljoscha Krettek <aljos...@apache.org>
> wrote:
> > >
> > > I like this version: ds.groupBy(0).aggregate(min(1), max(1), cnt()),
> > > very concise.
> > >
> > > On Mon, Nov 10, 2014 at 10:42 AM, Viktor Rosenfeld
> > > <viktor.rosenf...@tu-berlin.de> wrote:
> > >> Hi Fabian,
> > >>
> > >> I ran into a problem with your syntax example:
> > >>
> > >> DataSet<Tuple2&lt;String, Integer>> ds = ...
> > >> DataSet<Tuple4&lt;Tuple2&lt;String,Integer>,Integer, Integer, Long>
> > result =
> > >> ds.groupBy(0).min(1).andMax(1).andCnt();
> > >>
> > >> Basically, in the example above we don't know how long the chain of
> > >> aggregation method calls is. Also, each aggregation method call adds a
> > >> field to the result tuple (the first call to groupBy returns a
> > >> Tuple1). Because the resultType of an operator is specified in the
> > >> constructur, every one of those method calls needs to create a new
> > >> Operator<OUT> with the correct result type. However, only the
> > >> translateToDataflow method of the last method call in the chain should
> > >> actually compute the aggregation.
> > >>
> > >> This can be achieved by testing if an aggregation method is called on
> > >> an AggregationOperator. The translateToDataFlow method of the
> > >> operators in the start/middle of the chain would then just return a
> > >> MapOperatorBase which simply extends the tuple. The
> > >> translateToDataFlow method of the last operator in the chain would
> > >> return a GroupReduceOperatorBase.
> > >>
> > >> This strategy seems very hackish and involves lots of unnecessary
> > >> copying of tuple data. I think a better way would be to use the
> > >> following syntax:
> > >>
> > >> ds.groupBy(0).aggregate(min(1), max(1), cnt())
> > >>
> > >> or
> > >>
> > >> ds.groupBy(0).min(1).max(1).cnt(1).aggregate()
> > >>
> > >> Here, there is only one method which creates a new operator, the
> > >> aggregate method, and the final resultType is known when aggregate is
> > >> called.
> > >>
> > >> What do you think?
> > >>
> > >> Best,
> > >> Viktor
> > >>
> > >>
> > >>
> > >> --
> > >> View this message in context:
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Hi-Aggregation-support-tp2311p2429.html
> > >> Sent from the Apache Flink (Incubator) Mailing List archive. mailing
> > list archive at Nabble.com.
> >
> >
>

Re: Hi / Aggregation support

Reply via email to