Re: Pre-sort value list in reduce

phonechen Tue, 15 Apr 2008 02:40:40 -0700

HI arkady,
I 'm also confuse on how does the hadoop framework do this job:
 transfering  many <key,value> pair of the output in the map() phase to <key
,list of value> before the reduce() phase.
such as Map() output:
 <hello,1>
<hello,1>
<world,1>
<hello,1>
 <world,1>
but the reduce() input is:
<hello,[1,1,1}>
<world,[1,1]>
Can you point me out which class take care of these?
Thanks very much!


Best Regards,

Yours
Phonechen

On 4/15/08, arkady borkovsky <[EMAIL PROTECTED]> wrote:
>
> look at
>  -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner
>
> --ab
>
> On Apr 14, 2008, at 4:25 PM, pi song wrote:
>
> Dear people in Hadoop mailing list,
> >
> > Is there any way to control the value list in reduce (Key, List of
> > values)
> > to be sorted? or at least clusteringly sorted (containing clusters of
> > sorted
> > values e.g. 1,1,1,2,2,2,2,3,3,3,  1,1,1,1,1,1,2,2,2,2,3
> > ,1,1,2,2,2,3,3,3,3,3,3,3) ?
> > I had a look at JobConf.setOutputValueGroupingComparator in javadoc and
> > I
> > think it might be the answer because I feel most of the time grouping in
> > Hadoop is done by sort. Am I right?
> >
> > Can anyone help me? How about the performance impact of your solution?
> >
> > Thanks in advance,
> > Pi
> >
>
>


-- 
--~--~---------~--~----~------------~-------~--

Best Regards,

Yours
Phonechen

-~----------~----~----~----~------~----~------

Re: Pre-sort value list in reduce

Reply via email to