Hadoop typically complains if you try to re-use a JobConf object by modifing job parameters (Mapper, Reducer, output path etc..) and re-submitting it to the job client. You should be creating a new JobConf object for every map-reduce job and if there are some parameters that should be copied from previous job, then you should be doing
JobConf newJob = new JobConf(oldJob, MyClass.class); ...(your changes to newJob) ... JobClient.runJob(newJob) This works for me. -----Original Message----- From: Mori Bellamy [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 15, 2008 4:27 AM To: [email protected] Subject: Re: How to chain multiple hadoop jobs? Weird. I use eclipse, but that's never happened to me. When you set up your JobConfs, for example: JobConf conf2 = new JobConf(getConf(),MyClass.class) is your "MyClass" in the same package as your driver program? also, do you run from eclipse or from the command line (i've never tried to launch a hadoop task from eclipse). if you run from the command line: hadoop jar MyMRTaskWrapper.jar myEntryClass option1 option2... and all of the requisite resources are in MyMRTaskWrapper.jar, i don't see what the problem would be. if this is the way you run a hadoop task, are you sure that all of the resources are getting compiled into the same jar? when you export a jar from eclipse, it won't pack up external resources by default. (look into addons like FatJAR for that). On Jul 14, 2008, at 2:25 PM, Sean Arietta wrote: > > Well that's what I need to do also... but Hadoop complains to me when > I attempt to do that. Are you using Eclipse by any chance to develop? > The > error I'm getting seems to be stemming from the fact that Hadoop > thinks I am uploading a new jar for EVERY execution of > JobClient.runJob() so it fails indicating the job jar file doesn't > exist. Did you have to turn something on/off to get it to ignore that > or are you using a different IDE? > Thanks! > > Cheers, > Sean > > > Mori Bellamy wrote: >> >> hey sean, >> >> i later learned that the method i originally posted (configuring >> different JobConfs and then running them, blocking style, with >> JobClient.runJob(conf)) was sufficient for my needs. the reason it >> was failing before was somehow my fault and the bugs somehow got >> fixed x_X. >> >> Lukas gave me a helpful reply pointing me to TestJobControl.java (in >> the hadoop source directory). it seems like this would be helpful if >> your job dependencies are complex. but for me, i just need to do one >> job after another (and every job only depends on the one right before >> it), so the code i originally posted works fine. >> On Jul 14, 2008, at 1:38 PM, Sean Arietta wrote: >> >>> >>> Could you please provide some small code snippets elaborating on how >>> you implemented that? I have a similar need as the author of this >>> thread and I would appreciate any help. Thanks! >>> >>> Cheers, >>> Sean >>> >>> >>> Joman Chu-2 wrote: >>>> >>>> Hi, I use Toolrunner.run() for multiple MapReduce jobs. It seems to >>>> work well. I've run sequences involving hundreds of MapReduce jobs >>>> in a for loop and it hasn't died on me yet. >>>> >>>> On Wed, July 9, 2008 4:28 pm, Mori Bellamy said: >>>>> Hey all, I'm trying to chain multiple mapreduce jobs together to >>>>> accomplish a complex task. I believe that the way to do it is as >>>>> follows: >>>>> >>>>> JobConf conf = new JobConf(getConf(), MyClass.class); //configure >>>>> job.... >>>>> set mappers, reducers, etc >>>>> SequenceFileOutputFormat.setOutputPath(conf,myPath1); >>>>> JobClient.runJob(conf); >>>>> >>>>> //new job JobConf conf2 = new JobConf(getConf(),MyClass.class) >>>>> SequenceFileInputFormat.setInputPath(conf,myPath1); //more >>>>> configuration... JobClient.runJob(conf2) >>>>> >>>>> Is this the canonical way to chain jobs? I'm having some trouble >>>>> with this method -- for especially long jobs, the latter MR tasks >>>>> sometimes do not start up. >>>>> >>>>> >>>> >>>> >>>> -- >>>> Joman Chu >>>> AIM: ARcanUSNUMquam >>>> IRC: irc.liquid-silver.net >>>> >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/How-to-chain-multiple-hadoop-jobs--tp18370089p >>> 18452309.html Sent from the Hadoop core-user mailing list archive at >>> Nabble.com. >>> >> >> >> > > -- > View this message in context: > http://www.nabble.com/How-to-chain-multiple-hadoop-jobs--tp18370089p18 > 453200.html Sent from the Hadoop core-user mailing list archive at > Nabble.com. >
