Depending on the nature of your jobs, Cascading has built in a topological scheduler. It will schedule all your work as their dependencies are satisfied. Dependencies being source data and inter- job intermediate data.

http://www.cascading.org

The first catch is that you will still need bash to start/stop your cluster and to start the cascading job (per your example below).

The second catch is that you currently must use the cascading api (or the groovy api) to assemble your data processing flows. Hopefully in the next couple weeks we will have a means to support custom/raw hadoop jobs as members of a set of dependent jobs.

This feature is being delayed by our adding support for stream assertions, the ability to validate data during runtime but have the assertions 'planned' out of the process flow on demand, ie. for production runs.

And for stream traps, built in support for siphoning off bad data into side files so long running (or low fidelity) jobs can continue running without losing any data.

can read more about these features here
http://groups.google.com/group/cascading-user

ckw

On Jun 10, 2008, at 2:48 PM, Meng Mao wrote:

I'm interested in the same thing -- is there a recommended way to batch
Hadoop jobs together?

On Tue, Jun 10, 2008 at 5:45 PM, Richard Zhang <[EMAIL PROTECTED] >
wrote:

Hello folks:
I am running several hadoop applications on hdfs. To save the efforts in issuing the set of commands every time, I am trying to use bash script to run the several applications sequentially. To let the job finishes before
it
is proceeding to the next job, I am using wait in the script like below.

sh bin/start-all.sh
wait
echo cluster start
(bin/hadoop jar hadoop-0.17.0-examples.jar randomwriter -D
test.randomwrite.bytes_per_map=107374182 rand)
wait
bin/hadoop jar hadoop-0.17.0-examples.jar randomtextwriter  -D
test.randomtextwrite.total_bytes=107374182 rand-text
bin/stop-all.sh
echo finished hdfs randomwriter experiment


However, it always give the error like below. Does anyone have better idea
on how to run the multiple sequential jobs with bash script?

HadoopScript.sh: line 39: wait: pid 10 is not a child of this shell

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.mapred.JobTracker$IllegalStateException: Job tracker
still
initializing
      at
org.apache.hadoop.mapred.JobTracker.ensureRunning(JobTracker.java: 1722)
      at
org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:1730)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at

sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at

sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

      at org.apache.hadoop.ipc.Client.call(Client.java:557)
      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
      at $Proxy1.getNewJobId(Unknown Source)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at

sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at

sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at

org .apache .hadoop .io .retry .RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
      at

org .apache .hadoop .io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java: 59)
      at $Proxy1.getNewJobId(Unknown Source)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:696) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java: 973)
      at
org.apache.hadoop.examples.RandomWriter.run(RandomWriter.java:276)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at
org.apache.hadoop.examples.RandomWriter.main(RandomWriter.java:287)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at

sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at

sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at

org.apache.hadoop.util.ProgramDriver $ProgramDescription.invoke(ProgramDriver.java:68)
      at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
      at
org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:53)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at

sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at

sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
      at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
      at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)




--
hustlin, hustlin, everyday I'm hustlin

--
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/





Reply via email to