Something else you'll want to be careful of is that whereas the mapper can definitely submit a job (JobClient.submitJob()) to run on the same cluster, you might deadlock yourself if you block on the return of the subsidiary job with Jobclient.runJob(). Depending on the scheduler used, cluster configuration, etc, your original job might be taking up all the task slots, and the second job won't ever get started. And ditto to the other comments about multiple task attempts, etc, potentially causing trouble with multiple enqueues.
I think that your application's structure will be much more easily-debugged if you do the first pass in the first job, and write out a "signal" file in HDFS if you need to run a second job. The client program then waits for the first job to end, and enqueues the second job based on the contents of the signal file. - Aaron On Sat, Dec 6, 2008 at 7:55 PM, tim robertson <[EMAIL PROTECTED]>wrote: > Of course you are quite correct. > On job submission from a Map you would need to check if the Job has > been submitted from elsewhere (a previously killed job etc). > > The only time I think this might be useful is to prioritise the > secondary jobs higher than the primary first pass... otherwise there > is probably no benefit over just waiting for the first pass to finish. > > > > On Sat, Dec 6, 2008 at 6:41 PM, Devaraj Das <[EMAIL PROTECTED]> wrote: > > > > > > > > On 12/6/08 10:43 PM, "tim robertson" <[EMAIL PROTECTED]> wrote: > > > >> I don't agree that this would be considered unconventional, as I have > >> scenarios where this makes sense too - one file with a summary view, > >> and others that are very detailed and a pass over the first one > >> determines which ones to analyse properly in a second job. > >> > > If you're running the first job to do just the first pass (the output of > > which is the list of documents that you want to analyze properly in the > > second job), then yes, this is okay (and this is what I hinted to in my > > earlier mail). However, if in your first job itself, you want to launch > the > > second job, this would be unconventional IMO. Things may not be > determistic > > - for example, take a case where a map from the first job launches the > > second job, and then the map dies for whatever reason. The second > execution > > of the same task (Hadoop would launch a second attempt) would launch the > > second job again and this may not be what you want... > > > >> I am a novice, but it looks like the slaves know about the Master > >> NameNode and JobTracker (in the Masters file), so it I think it is > >> worth trying. > >> > >> Cheers, > >> Tim > >> > >> > >> On Sat, Dec 6, 2008 at 5:17 PM, Devaraj Das <[EMAIL PROTECTED]> wrote: > >>> > >>> > >>> > >>> On 12/6/08 2:42 PM, "deng chao" <[EMAIL PROTECTED]> wrote: > >>> > >>>> Hi, > >>>> we have met a case need your help > >>>> The case: In the Mapper class, named MapperA, we define a map() > function, > >>>> and in this map() function, we want to submit another new job, named > jobB. > >>>> does hadoop support this case? > >>> > >>> Although you can, the design of your application would be > unconventional. > >>> Please see if you can redesign your application so that it doesn't have > to > >>> do this. Couldn't you run some algorithm on the client side, and > depending > >>> on that output, submit a job? The other case you might want to consider > is > >>> to have a series of jobs where the output of one job is the input of > >>> another.. > >>> > >>> > >>> > > > > > > >
