For complex workflows indeed Oozie(or Azkaban) is the answer. -Bhartah
On Wed, Feb 15, 2012 at 1:29 PM, Bharath Mundlapudi <[email protected]>wrote: > Or you could use job chaining in MR. > http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining > > -Bharath > > > On Wed, Feb 15, 2012 at 11:26 AM, John Armstrong <[email protected]> wrote: > >> Actually, I think this is what Oozie is for. It seems to leap out as a >> great example of a forked workflow. >> >> hth >> >> >> >> On 02/15/2012 02:23 PM, W.P. McNeill wrote: >> >>> Say I have two Hadoop jobs, A and B, that can be run in parallel. I have >>> another job, C, that takes the output of both A and B as input. I want to >>> run A and B at the same time, wait until both have finished, and then >>> launch C. What is the best way to do this? >>> >>> I know the answer if I've got a single Java client program that launches >>> A, >>> B, and C. But what if I don't have the option to launch all of them from >>> a >>> single Java program? (Say I've got a much more complicated system with >>> many >>> steps happening between A-B and C.) How do I synchronize between jobs, >>> make >>> sure there's no race conditions etc. Is this what Zookeeper is for? >>> >> >> >
