Re: does anyone have idea on how to run multiple sequential jobs with bash script

Chris K Wensel Wed, 11 Jun 2008 13:33:25 -0700

Depending on the nature of your jobs, Cascading has built in atopological scheduler. It will schedule all your work as theirdependencies are satisfied. Dependencies being source data and inter-job intermediate data.


http://www.cascading.org

The first catch is that you will still need bash to start/stop yourcluster and to start the cascading job (per your example below).

The second catch is that you currently must use the cascading api (orthe groovy api) to assemble your data processing flows. Hopefully inthe next couple weeks we will have a means to support custom/rawhadoop jobs as members of a set of dependent jobs.

This feature is being delayed by our adding support for streamassertions, the ability to validate data during runtime but have theassertions 'planned' out of the process flow on demand, ie. forproduction runs.

And for stream traps, built in support for siphoning off bad data intoside files so long running (or low fidelity) jobs can continue runningwithout losing any data.


can read more about these features here
http://groups.google.com/group/cascading-user

ckw

On Jun 10, 2008, at 2:48 PM, Meng Mao wrote:

I'm interested in the same thing -- is there a recommended way tobatch
Hadoop jobs together?
On Tue, Jun 10, 2008 at 5:45 PM, Richard Zhang <[EMAIL PROTECTED]>
wrote:
Hello folks:
I am running several hadoop applications on hdfs. To save theefforts inissuing the set of commands every time, I am trying to use bashscript torun the several applications sequentially. To let the job finishesbefore
it
is proceeding to the next job, I am using wait in the script likebelow.
sh bin/start-all.sh
wait
echo cluster start
(bin/hadoop jar hadoop-0.17.0-examples.jar randomwriter -D
test.randomwrite.bytes_per_map=107374182 rand)
wait
bin/hadoop jar hadoop-0.17.0-examples.jar randomtextwriter  -D
test.randomtextwrite.total_bytes=107374182 rand-text
bin/stop-all.sh
echo finished hdfs randomwriter experiment
However, it always give the error like below. Does anyone havebetter idea
on how to run the multiple sequential jobs with bash script?

HadoopScript.sh: line 39: wait: pid 10 is not a child of this shell

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.mapred.JobTracker$IllegalStateException: Jobtracker
still
initializing
      at
org.apache.hadoop.mapred.JobTracker.ensureRunning(JobTracker.java:1722)
      at
org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:1730)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

      at org.apache.hadoop.ipc.Client.call(Client.java:557)
      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
      at $Proxy1.getNewJobId(Unknown Source)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
      at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
      at $Proxy1.getNewJobId(Unknown Source)
atorg.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:696)at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
      at
org.apache.hadoop.examples.RandomWriter.run(RandomWriter.java:276)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at
org.apache.hadoop.examples.RandomWriter.main(RandomWriter.java:287)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
      at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
      at
org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:53)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
      at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
      at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
--
hustlin, hustlin, everyday I'm hustlin


--
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Re: does anyone have idea on how to run multiple sequential jobs with bash script

Reply via email to