HI arkady, I 'm also confuse on how does the hadoop framework do this job: transfering many <key,value> pair of the output in the map() phase to <key ,list of value> before the reduce() phase. such as Map() output: <hello,1> <hello,1> <world,1> <hello,1> <world,1> but the reduce() input is: <hello,[1,1,1}> <world,[1,1]> Can you point me out which class take care of these? Thanks very much!
Best Regards, Yours Phonechen On 4/15/08, arkady borkovsky <[EMAIL PROTECTED]> wrote: > > look at > -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner > > --ab > > On Apr 14, 2008, at 4:25 PM, pi song wrote: > > Dear people in Hadoop mailing list, > > > > Is there any way to control the value list in reduce (Key, List of > > values) > > to be sorted? or at least clusteringly sorted (containing clusters of > > sorted > > values e.g. 1,1,1,2,2,2,2,3,3,3, 1,1,1,1,1,1,2,2,2,2,3 > > ,1,1,2,2,2,3,3,3,3,3,3,3) ? > > I had a look at JobConf.setOutputValueGroupingComparator in javadoc and > > I > > think it might be the answer because I feel most of the time grouping in > > Hadoop is done by sort. Am I right? > > > > Can anyone help me? How about the performance impact of your solution? > > > > Thanks in advance, > > Pi > > > > -- --~--~---------~--~----~------------~-------~-- Best Regards, Yours Phonechen -~----------~----~----~----~------~----~------
