Harsh, Thanks.
(1) It seems I need more shell knowledge here. For example, in order to submit two jobs at the same time, I have a shell script batch.sh with just three lines: bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' & bin/hadoop jar hadoop-examples-*.jar grep input output2 'dfs[a-z.]+' & bin/hadoop jar hadoop-examples-*.jar grep input output3 'dfs[a-z.]+' & Is this what you mean by fork? Could you please give a short example to show how to use "$!" for monitoring? I could then google and learn it further myself. (2) sorry for the confusion here. Do you think it's feasible to control the interval of job submission in a shell script (or python etc.), if I don't change the java code? For instance, in the above example, can I set the 2rd job runs 1 second after the submission of the 1st job, and then submit the last job 5 seconds after the second job? (3) by the way, i also tried to modify the java code to run several grep jobs in hadoop-examples-*.jar, so I modified the Grep.java in src/examples/org/apache/hadoop/examples as follows and named the new one as EthanGrep.java: I remove the main function and use JobClient.submitJob(grepJob); to replace JobClient.runJob(grepJob); in run(String[] args). Then I wrote a EthanTest.java: public class EthanTest { public static void main(String[] args) throws Exception { for (int x = 0; x < 2; x++) { int res = ToolRunner.run(new Configuration(), new EthanGrep(), args); } } } No compilation error. But when I try later: bin/hadoop jar hadoop-examples-*.jar EthanTest it complains no such "EthanTest" found, valid names are grep etc. it seems I miss something here? Thanks again, Ethan On Sun, Apr 22, 2012 at 4:59 PM, Harsh J <ha...@cloudera.com> wrote: > Hey Ethan, > > First question: Yes, that is what I meant. > > Second question: When you do a fork, the PID of the last command from > the script is stored a "$!" variable. You can grab these each time you > do a fork and then monitor them (at least PID-wise). > > I'm still not sure what you mean by "especially if I need to run > different kinds of jobs and control the inter-arrival time?" actually > but forking is the answer to your other need, if you can't change > code. > > On Sun, Apr 22, 2012 at 11:42 PM, brisk <mylinq...@gmail.com> wrote: > > Hi, Harsh, > > > > Thanks so much for your answer! > > > > By "run multiple command lines using a fork and managing them > afterwards", > > do you mean just put "&" at the end of each command and let each command > > line run in the background? Then what do you mean by "managing them > > afterwards"? > > > > Best, > > Ethan > > > > On Sun, Apr 22, 2012 at 12:02 PM, Harsh J <ha...@cloudera.com> wrote: > >> > >> Is your requirement to not have the job launcher program return until > >> completion? For that you should either edit the java sources to not > >> waitForCompletion(…) (and just submit()), or run multiple command > >> lines using a fork and managing them afterwards. > >> > >> For example you can do: > >> bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' & > >> > >> And the process should run in the background until termination, > >> allowing you to run another without needing to open a new terminal. > >> > >> Is this what you're looking for? > >> > >> On Sun, Apr 22, 2012 at 9:51 PM, brisk <mylinq...@gmail.com> wrote: > >> > Hi, > >> > > >> > Does anybody know how to submit multiple hadoop jobs without opening > >> > multiple terminals? I found one method is to use Job.Submit() in > >> > ToolRunner.run(), > >> > but can I use a shell script to submit jobs (with command like > >> > "bin/hadoop > >> > jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' ") instead of > >> > modifying java files/source code, > >> > especially if I need to run different kinds of jobs and control the > >> > inter-arrival time? > >> > > >> > Thanks, > >> > Ethan > >> > >> > >> > >> -- > >> Harsh J > > > > > > > > -- > Harsh J >