Arkady, Isn't the partitioner for just redirecting map output to the right reduce bucket? What I want is each value list in reduce being sorted.
Pi On Tue, Apr 15, 2008 at 7:40 PM, phonechen <[EMAIL PROTECTED]> wrote: > HI arkady, > I 'm also confuse on how does the hadoop framework do this job: > transfering many <key,value> pair of the output in the map() phase to > <key > ,list of value> before the reduce() phase. > such as Map() output: > <hello,1> > <hello,1> > <world,1> > <hello,1> > <world,1> > but the reduce() input is: > <hello,[1,1,1}> > <world,[1,1]> > Can you point me out which class take care of these? > Thanks very much! > > Best Regards, > > Yours > Phonechen > > On 4/15/08, arkady borkovsky <[EMAIL PROTECTED]> wrote: > > > > look at > > -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner > > > > --ab > > > > On Apr 14, 2008, at 4:25 PM, pi song wrote: > > > > Dear people in Hadoop mailing list, > > > > > > Is there any way to control the value list in reduce (Key, List of > > > values) > > > to be sorted? or at least clusteringly sorted (containing clusters of > > > sorted > > > values e.g. 1,1,1,2,2,2,2,3,3,3, 1,1,1,1,1,1,2,2,2,2,3 > > > ,1,1,2,2,2,3,3,3,3,3,3,3) ? > > > I had a look at JobConf.setOutputValueGroupingComparator in javadoc > and > > > I > > > think it might be the answer because I feel most of the time grouping > in > > > Hadoop is done by sort. Am I right? > > > > > > Can anyone help me? How about the performance impact of your solution? > > > > > > Thanks in advance, > > > Pi > > > > > > > > > > -- > --~--~---------~--~----~------------~-------~-- > > Best Regards, > > Yours > Phonechen > > -~----------~----~----~----~------~----~------ >
