I think the issue is rather grouping a DataSet of custom types on multiple fields than grouping a Tuple DataSet. In this case you need to use a KeySelector and would like to return a Tuple containing all fields you want to group on. But as Slava said the returning type must be comparable (which Tuples are not).
I think it should be possible to check at optimization time whether all fields of a tuple are comparable and allow to use such tuples as a grouping key. Would be good to open a JIRA for this in any case. This is a common problem when working with POJOs. 2014-06-12 0:25 GMT+02:00 Robert Metzger <[email protected]>: > Hi Slava, > > I'm forwarding your message to our new mailing list at Apache: > [email protected] > You can subscribe to the list by sending an (empty) email to: > [email protected]. > We are planning to shut down the stratosphere-dev@googlegroups soon. > > Regarding your question: When using the Tuples, you don't need to specify a > keySelector. It is sufficient to specify the ID(s) of the keys: > > http://stratosphere-javadocs.github.io/eu/stratosphere/api/java/DataSet.html#groupBy(int > .. > .) > So you should be able to do a ".groupBy(0,3,4)" > > Robert > > ---------- Forwarded message ---------- > From: Vyacheslav Zholudev <[email protected]> > Date: Thu, Jun 12, 2014 at 12:17 AM > Subject: [stratosphere-dev] Grouping by a tuple > To: [email protected] > > > Hi, > > Being used to the Hive grouping like "GROUP BY userId, productId, year" I'm > wondering what's the best way to do it in Stratosphere? The groupBy's > KeySelector implies that a Comparable object is returned, however, the > obvious choice like TupleN is not comparable. In primitive cases I would > prefer to avoid introducing comparable extra entities for grouping tuples > of "primitive" types. Would it make sense to introduce "ComparableTupleN<T1 > extends Comparable<? extends T1>, ..., Tn extends Comparable<? extends > Tn>>"? > > Or am I missing the obvious way in a Stratosphere way? > > Thanks, > Vyacheslav > > -- > You received this message because you are subscribed to the Google Groups > "stratosphere-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > Visit this group at http://groups.google.com/group/stratosphere-dev. > For more options, visit https://groups.google.com/d/optout. >
