Re: Pre-sort value list in reduce

pi song Tue, 15 Apr 2008 06:00:41 -0700

Arkady,

Isn't the partitioner for just redirecting map output to the right reduce
bucket? What I want is each value list in reduce being sorted.


Pi

On Tue, Apr 15, 2008 at 7:40 PM, phonechen <[EMAIL PROTECTED]> wrote:

> HI arkady,
> I 'm also confuse on how does the hadoop framework do this job:
>  transfering  many <key,value> pair of the output in the map() phase to
> <key
> ,list of value> before the reduce() phase.
> such as Map() output:
>  <hello,1>
> <hello,1>
> <world,1>
> <hello,1>
>  <world,1>
> but the reduce() input is:
> <hello,[1,1,1}>
> <world,[1,1]>
> Can you point me out which class take care of these?
> Thanks very much!
>
> Best Regards,
>
> Yours
> Phonechen
>
> On 4/15/08, arkady borkovsky <[EMAIL PROTECTED]> wrote:
> >
> > look at
> >  -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner
> >
> > --ab
> >
> > On Apr 14, 2008, at 4:25 PM, pi song wrote:
> >
> > Dear people in Hadoop mailing list,
> > >
> > > Is there any way to control the value list in reduce (Key, List of
> > > values)
> > > to be sorted? or at least clusteringly sorted (containing clusters of
> > > sorted
> > > values e.g. 1,1,1,2,2,2,2,3,3,3,  1,1,1,1,1,1,2,2,2,2,3
> > > ,1,1,2,2,2,3,3,3,3,3,3,3) ?
> > > I had a look at JobConf.setOutputValueGroupingComparator in javadoc
> and
> > > I
> > > think it might be the answer because I feel most of the time grouping
> in
> > > Hadoop is done by sort. Am I right?
> > >
> > > Can anyone help me? How about the performance impact of your solution?
> > >
> > > Thanks in advance,
> > > Pi
> > >
> >
> >
>
>
> --
> --~--~---------~--~----~------------~-------~--
>
> Best Regards,
>
> Yours
> Phonechen
>
> -~----------~----~----~----~------~----~------
>

Re: Pre-sort value list in reduce

Reply via email to