I guess you would need static method imports to make the code look like this, which I think is fine.
On Mon, Nov 10, 2014 at 12:00 PM, Fabian Hueske <fhue...@apache.org> wrote: > How/where do you plan to define the methods min(1), max(1), and cnt()? > If these are static methods in some kind of Aggregation class, it won't > look so concise anymore, or am I missing something here? > > I would be fine with both ways, the second one being nice, if it can be > done like that. > > 2014-11-10 11:03 GMT+01:00 Gyula Fora <gyf...@apache.org>: > > > I also support this approach: > > > > ds.groupBy(0).aggregate(min(1), max(1), cnt()) > > > > I think it makes the code more readable, because it is easy to see whats > > in the result tuple. > > > > Gyula > > > > > On 10 Nov 2014, at 10:49, Aljoscha Krettek <aljos...@apache.org> > wrote: > > > > > > I like this version: ds.groupBy(0).aggregate(min(1), max(1), cnt()), > > > very concise. > > > > > > On Mon, Nov 10, 2014 at 10:42 AM, Viktor Rosenfeld > > > <viktor.rosenf...@tu-berlin.de> wrote: > > >> Hi Fabian, > > >> > > >> I ran into a problem with your syntax example: > > >> > > >> DataSet<Tuple2<String, Integer>> ds = ... > > >> DataSet<Tuple4<Tuple2<String,Integer>,Integer, Integer, Long> > > result = > > >> ds.groupBy(0).min(1).andMax(1).andCnt(); > > >> > > >> Basically, in the example above we don't know how long the chain of > > >> aggregation method calls is. Also, each aggregation method call adds a > > >> field to the result tuple (the first call to groupBy returns a > > >> Tuple1). Because the resultType of an operator is specified in the > > >> constructur, every one of those method calls needs to create a new > > >> Operator<OUT> with the correct result type. However, only the > > >> translateToDataflow method of the last method call in the chain should > > >> actually compute the aggregation. > > >> > > >> This can be achieved by testing if an aggregation method is called on > > >> an AggregationOperator. The translateToDataFlow method of the > > >> operators in the start/middle of the chain would then just return a > > >> MapOperatorBase which simply extends the tuple. The > > >> translateToDataFlow method of the last operator in the chain would > > >> return a GroupReduceOperatorBase. > > >> > > >> This strategy seems very hackish and involves lots of unnecessary > > >> copying of tuple data. I think a better way would be to use the > > >> following syntax: > > >> > > >> ds.groupBy(0).aggregate(min(1), max(1), cnt()) > > >> > > >> or > > >> > > >> ds.groupBy(0).min(1).max(1).cnt(1).aggregate() > > >> > > >> Here, there is only one method which creates a new operator, the > > >> aggregate method, and the final resultType is known when aggregate is > > >> called. > > >> > > >> What do you think? > > >> > > >> Best, > > >> Viktor > > >> > > >> > > >> > > >> -- > > >> View this message in context: > > > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Hi-Aggregation-support-tp2311p2429.html > > >> Sent from the Apache Flink (Incubator) Mailing List archive. mailing > > list archive at Nabble.com. > > > > >