I will go with that workaround, however I would have preferred if I could have done that directly with the API instead of doing Map/Reduce like Key/Value tuples again :-)
By the way is there a simple function to count the number of items in a reduce group? It feels stupid to write a GroupReduce that just iterates and increments a counter. cheers Martin On Tue, Oct 21, 2014 at 2:54 PM, Robert Metzger <rmetz...@apache.org> wrote: > Yes, for sorted groups, you need to use Pojos or Tuples. > I think you have to split the input lines manually, with a mapper. > How about using a TupleN<...> with only the fields you need? (returned by > the mapper) > > if you need all fields, you could also use a Tuple2<String, String[]> where > the first position is the sort key? > > > > On Tue, Oct 21, 2014 at 2:20 PM, Gyula Fora <gyf...@apache.org> wrote: > > > I am not sure how you should go about that, let’s wait for some feedback > > from the others. > > > > Until then you can always map the array to (array, keyfield) and use > > groupBy(1). > > > > > > > On 21 Oct 2014, at 14:17, Martin Neumann <mneum...@spotify.com> wrote: > > > > > > Hej, > > > > > > Unfortunately .sort() cannot take a key extractor, would I have to do > the > > > sort myself then? > > > > > > cheers Martin > > > > > > On Tue, Oct 21, 2014 at 2:08 PM, Gyula Fora <gyf...@apache.org> wrote: > > > > > >> Hey, > > >> > > >> Using arrays is probably a convenient way to do so. > > >> > > >> I think the way you described the groupBy only works for tuples now. > To > > do > > >> the grouping on the array field, you would need to create a key > > extractor > > >> for this and pass that to groupBy. > > >> > > >> Actually we have some use-cases like this for streaming so we are > > thinking > > >> of writing a wrapper for the array types that would behave as you > > described. > > >> > > >> Regards, > > >> Gyula > > >> > > >>> On 21 Oct 2014, at 14:03, Martin Neumann <mneum...@spotify.com> > wrote: > > >>> > > >>> Hej, > > >>> > > >>> I have a csv file with 54 columns each of them is string (for now). I > > >> need > > >>> to group and sort them on field 15. > > >>> > > >>> Whats the best way to load the data into Flink? > > >>> There is no Tuple54 (and the <> would look awful anyway with 54 times > > >>> String in it). > > >>> My current Idea is to write a Mapper and split the string to Arrays > of > > >>> Strings would grouping and sorting work on this? > > >>> > > >>> So can I do something like this or does that only work on tuples: > > >>> Dataset<String[]> ds; > > >>> ds.groupBy(15).sort(20. ANY) > > >>> > > >>> cheers Martin > > >> > > >> > > > > >