Re: chained mappers & reducers

Amogh Vasekar Tue, 19 Jan 2010 22:53:56 -0800

Hi,
Can you elaborate on your case a little?
If you need sort and shuffle ( ie outputs of different reducer tasks of R1 to 
be aggregated in some way ) , you have to write another map-red job. If you 
need to process only local reducer data ( ie your reducer output key is same as 
input key ),  your job would be M1-R1-M2. Essentially in Hadoop, you can have 
one sort and shuffle phase in one job.
Note that chain APIs are for jobs of the form (M+RM*).


Amogh


On 1/20/10 2:29 AM, "Clements, Michael" <michael.cleme...@disney.com> wrote:

These two classes are not really symmetric as the name suggests.
ChainedMapper does what I expected: chains multiple map steps. But
ChainedReducer does not chain reducer steps. It chains map steps to
follow a reduce step. At least, that is my understanding given the API
docs & examples I've read.

Is there a way to chain multiple reducer steps? I've got a job that
needs a M-R1-R2. It currently has 2 phases: M1-R1 followed by M2-R2,
where M2 is an identity pass-through mapper. If there were a way to
chain 2 reduce steps the way ChainedMapper chains map steps, I could
make this into a one-pass job, eliminating the overhead of a second job
and all the unnecessary I/O.

Thanks

Michael Clements
Solutions Architect
michael.cleme...@disney.com
206 664-4374 office
360 317 5051 mobile

Re: chained mappers & reducers

Reply via email to