What you may be looking for is a workflow system such as Oozie
(yahoo.github.com/oozie/) or Azkaban
(http://sna-projects.com/azkaban/).

If your needs are simple (2-3 jobs, not too many conditions, etc. per
workflow), you can checkout the JobControl API
(http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/jobcontrol/package-summary.html)
Hadoop offers to let you add dependent jobs and create uncomplicated
dep-chains.

P.s. Know that usually phases such as M-M-M-M can simply be M. If you
want modularity in code to represent phases, checkout ChainMapper
(http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/lib/ChainMapper.html).

On Mon, Jul 25, 2011 at 11:50 PM, Ross Nordeen <rjnor...@mtu.edu> wrote:
>
>
> Hello all,
>
> I am trying to write a MR program where the output from the mappers are 
> dependent on the previous map processes.  I understand that a job scheduler 
> exists to control such processes.  Would anyone be able to give some sample 
> code of a working implementation of this in hadoop 0.20.2?
>
> --
> Ross Nordeen
> Computer Networking And Systems Administration
> Michigan Technological University
> http://www.linkedin.com/in/rjnordee
>
>



-- 
Harsh J

Reply via email to