You can use Oozie for that, you can write a workflow job that forks A & B and then joins before C.
Thanks. Alejandro On Wed, Feb 15, 2012 at 11:23 AM, W.P. McNeill <[email protected]> wrote: > Say I have two Hadoop jobs, A and B, that can be run in parallel. I have > another job, C, that takes the output of both A and B as input. I want to > run A and B at the same time, wait until both have finished, and then > launch C. What is the best way to do this? > > I know the answer if I've got a single Java client program that launches A, > B, and C. But what if I don't have the option to launch all of them from a > single Java program? (Say I've got a much more complicated system with many > steps happening between A-B and C.) How do I synchronize between jobs, make > sure there's no race conditions etc. Is this what Zookeeper is for?
