Hi, Jones
 As you have met the situation I am worried about, I got my answer now.
Maybe re-design the map function or add a combiner is the only way to deal
with this kind of input data .

2010/3/31 Jones, Nick <[email protected]>

> I ran into an issue where lots of data was passing from mappers to a single
> reducer. Enabling a combiner saved quite a bit of processing time by
> reducing mapper disk writes and data movements to the reducer.
>
> Nick Jones
>
>
> ----- Original Message -----
> From: Cui tony <[email protected]>
> To: [email protected] <[email protected]>
> Sent: Tue Mar 30 21:24:18 2010
> Subject: Re: question on shuffle and sort
>
> Consider this extreme situation:
> The input data is very large, and also the map result. 90% of map result
> have the same key, then all of them will be sent to one reducer tasknode.
> So 90% of work of reduce phase have to been done on a single node, not the
> cluster. That is very ineffective and less scalable.
>
>
> 2010/3/31 Jones, Nick <[email protected]>
>
> > Something to keep in mind though, sorting is appropriate to the key type.
> > Text will be sorted lexicographically.
> >
> > Nick Jones
> >
> >
> > ----- Original Message -----
> > From: Ed Mazur <[email protected]>
> > To: [email protected] <[email protected]>
> > Sent: Tue Mar 30 21:07:29 2010
> > Subject: Re: question on shuffle and sort
> >
> > On Tue, Mar 30, 2010 at 9:56 PM, Cui tony wrote:
> > >  Did all key-value pairs of the map output, which have the same key,
> will
> > > be sent to the same reducer tasknode?
> >
> > Yes, this is at the core of the MapReduce model. There is one call to
> > the user reduce function per unique map output key. This grouping is
> > achieved by sorting which means you see keys in increasing order.
> >
> > Ed
> >
> >
> >
>
>

Reply via email to