By the way, do you actually need all those 54 columns in your job?
On Tue, Oct 21, 2014 at 3:02 PM, Martin Neumann <[email protected]> wrote: > I will go with that workaround, however I would have preferred if I could > have done that directly with the API instead of doing Map/Reduce like > Key/Value tuples again :-) > > By the way is there a simple function to count the number of items in a > reduce group? It feels stupid to write a GroupReduce that just iterates and > increments a counter. > > cheers Martin > > On Tue, Oct 21, 2014 at 2:54 PM, Robert Metzger <[email protected]> wrote: > >> Yes, for sorted groups, you need to use Pojos or Tuples. >> I think you have to split the input lines manually, with a mapper. >> How about using a TupleN<...> with only the fields you need? (returned by >> the mapper) >> >> if you need all fields, you could also use a Tuple2<String, String[]> where >> the first position is the sort key? >> >> >> >> On Tue, Oct 21, 2014 at 2:20 PM, Gyula Fora <[email protected]> wrote: >> >> > I am not sure how you should go about that, let’s wait for some feedback >> > from the others. >> > >> > Until then you can always map the array to (array, keyfield) and use >> > groupBy(1). >> > >> > >> > > On 21 Oct 2014, at 14:17, Martin Neumann <[email protected]> wrote: >> > > >> > > Hej, >> > > >> > > Unfortunately .sort() cannot take a key extractor, would I have to do >> the >> > > sort myself then? >> > > >> > > cheers Martin >> > > >> > > On Tue, Oct 21, 2014 at 2:08 PM, Gyula Fora <[email protected]> wrote: >> > > >> > >> Hey, >> > >> >> > >> Using arrays is probably a convenient way to do so. >> > >> >> > >> I think the way you described the groupBy only works for tuples now. >> To >> > do >> > >> the grouping on the array field, you would need to create a key >> > extractor >> > >> for this and pass that to groupBy. >> > >> >> > >> Actually we have some use-cases like this for streaming so we are >> > thinking >> > >> of writing a wrapper for the array types that would behave as you >> > described. >> > >> >> > >> Regards, >> > >> Gyula >> > >> >> > >>> On 21 Oct 2014, at 14:03, Martin Neumann <[email protected]> >> wrote: >> > >>> >> > >>> Hej, >> > >>> >> > >>> I have a csv file with 54 columns each of them is string (for now). I >> > >> need >> > >>> to group and sort them on field 15. >> > >>> >> > >>> Whats the best way to load the data into Flink? >> > >>> There is no Tuple54 (and the <> would look awful anyway with 54 times >> > >>> String in it). >> > >>> My current Idea is to write a Mapper and split the string to Arrays >> of >> > >>> Strings would grouping and sorting work on this? >> > >>> >> > >>> So can I do something like this or does that only work on tuples: >> > >>> Dataset<String[]> ds; >> > >>> ds.groupBy(15).sort(20. ANY) >> > >>> >> > >>> cheers Martin >> > >> >> > >> >> > >> > >>
