I think you can do this by creating your own key type extending IntWritable and override the compareTo method to implement this. Cheers
Tim On Wed, Jun 17, 2009 at 6:34 PM, Kunsheng Chen <ke...@yahoo.com> wrote: > > Thanks, Alex! It is really helpful, at least I know it is sorted in > someway. > > Furthermore, could I control it as 'Ascend' or 'Descend' order ? Say if my > keys are Integers, and I want them to be in Descend order, is it easy to do > that ? > > > Thanks again, > > -Kun > > --- On Mon, 6/15/09, Alex Loddengaard <a...@cloudera.com> wrote: > > > From: Alex Loddengaard <a...@cloudera.com> > > Subject: Re: Anyway to sort "keys" before Reduce function in Hadoop ? > > To: core-user@hadoop.apache.org > > Date: Monday, June 15, 2009, 11:53 PM > > Hey Kun, > > > > Keys given to a given reducer instance are given in sorted > > order. Meaning, > > for a given reducer JVM instance, the reduce function will > > be called several > > times, once for each key. The order in which the keys > > are given to the > > reduce function are sorted. The sorting happens in > > the shuffle phase, which > > is basically partitioning and sorting. That said, if > > you have one reducer > > (which isn't possible in large jobs), keys will be given to > > you in sorted > > order. > > > > You may be interested in the combiner phase, which is > > essentially a mini > > reduce that happens before data is transferred between > > mapper and reducer: > > > > <http://wiki.apache.org/hadoop/HadoopMapReduce> (grep > > for "combine") > > > > You may also find these videos useful: > > <http://www.cloudera.com/hadoop-training-mapreduce-hdfs> > > <http://www.cloudera.com/hadoop-training-programming-with-hadoop> > > > > Hope this helps. Let me know if I misunderstood your > > question. > > > > Alex > > > > On Mon, Jun 15, 2009 at 4:22 PM, Kunsheng Chen <ke...@yahoo.com> > > wrote: > > > > > > > > Hi everyone, > > > > > > Is there anyway to sort the "keys" before Reduce but > > after Map ? > > > > > > > > > I also think of sorting keys myself in Reduce > > function, but it might take > > > too many memory once the number of results getting > > large. > > > > > > I am thinking of using some numeric value as "keys" in > > Reduce (which was > > > calculate by Map). If it is possible, I could output > > my results by some > > > orders easily. > > > > > > > > > Thanks in advance, > > > > > > -Kun > > > > > > > > > > > > > > > > > >