RE: How to chain multiple hadoop jobs?

Goel, Ankur Tue, 15 Jul 2008 01:01:47 -0700

Hadoop typically complains if you try to re-use a JobConf object by
modifing job parameters (Mapper, Reducer, output path etc..) and
re-submitting it to the job client. You should be creating a new JobConf
object for every map-reduce job and if there are some parameters that
should be copied from previous job, then you should be doing


JobConf newJob = new JobConf(oldJob, MyClass.class);
...(your changes to newJob) ...
JobClient.runJob(newJob)

This works for me.

-----Original Message-----
From: Mori Bellamy [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 15, 2008 4:27 AM
To: [email protected]
Subject: Re: How to chain multiple hadoop jobs?

Weird. I use eclipse, but that's never happened to me. When  you set up
your JobConfs, for example:
JobConf conf2 = new JobConf(getConf(),MyClass.class) is your "MyClass"
in the same package as your driver program? also, do you run from
eclipse or from the command line (i've never tried to launch a hadoop
task from eclipse). if you run from the command line:

hadoop jar MyMRTaskWrapper.jar myEntryClass option1 option2...

and all of the requisite resources are in MyMRTaskWrapper.jar, i don't
see what the problem would be. if this is the way you run a hadoop task,
are you sure that all of the resources are getting compiled into the
same jar? when you export a jar from eclipse, it won't pack up external
resources by default. (look into addons like FatJAR for that).


On Jul 14, 2008, at 2:25 PM, Sean Arietta wrote:

>
> Well that's what I need to do also... but Hadoop complains to me when 
> I attempt to do that. Are you using Eclipse by any chance to develop?
> The
> error I'm getting seems to be stemming from the fact that Hadoop 
> thinks I am uploading a new jar for EVERY execution of 
> JobClient.runJob() so it fails indicating the job jar file doesn't 
> exist. Did you have to turn something on/off to get it to ignore that 
> or are you using a different IDE?
> Thanks!
>
> Cheers,
> Sean
>
>
> Mori Bellamy wrote:
>>
>> hey sean,
>>
>> i later learned that the method i originally posted (configuring 
>> different JobConfs and then running them, blocking style, with
>> JobClient.runJob(conf)) was sufficient for my needs. the reason it 
>> was failing before was somehow my fault and the bugs somehow got 
>> fixed x_X.
>>
>> Lukas gave me a helpful reply pointing me to TestJobControl.java (in 
>> the hadoop source directory). it seems like this would be helpful if 
>> your job dependencies are complex. but for me, i just need to do one 
>> job after another (and every job only depends on the one right before

>> it), so the code i originally posted works fine.
>> On Jul 14, 2008, at 1:38 PM, Sean Arietta wrote:
>>
>>>
>>> Could you please provide some small code snippets elaborating on how

>>> you implemented that? I have a similar need as the author of this 
>>> thread and I would appreciate any help. Thanks!
>>>
>>> Cheers,
>>> Sean
>>>
>>>
>>> Joman Chu-2 wrote:
>>>>
>>>> Hi, I use Toolrunner.run() for multiple MapReduce jobs. It seems to

>>>> work well. I've run sequences involving hundreds of MapReduce jobs 
>>>> in a for loop and it hasn't died on me yet.
>>>>
>>>> On Wed, July 9, 2008 4:28 pm, Mori Bellamy said:
>>>>> Hey all, I'm trying to chain multiple mapreduce jobs together to 
>>>>> accomplish a complex task. I believe that the way to do it is as
>>>>> follows:
>>>>>
>>>>> JobConf conf = new JobConf(getConf(), MyClass.class); //configure 
>>>>> job....
>>>>> set mappers, reducers, etc
>>>>> SequenceFileOutputFormat.setOutputPath(conf,myPath1);
>>>>> JobClient.runJob(conf);
>>>>>
>>>>> //new job JobConf conf2 = new JobConf(getConf(),MyClass.class) 
>>>>> SequenceFileInputFormat.setInputPath(conf,myPath1); //more 
>>>>> configuration... JobClient.runJob(conf2)
>>>>>
>>>>> Is this the canonical way to chain jobs? I'm having some trouble 
>>>>> with this method -- for especially long jobs, the latter MR tasks 
>>>>> sometimes do not start up.
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Joman Chu
>>>> AIM: ARcanUSNUMquam
>>>> IRC: irc.liquid-silver.net
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/How-to-chain-multiple-hadoop-jobs--tp18370089p
>>> 18452309.html Sent from the Hadoop core-user mailing list archive at

>>> Nabble.com.
>>>
>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/How-to-chain-multiple-hadoop-jobs--tp18370089p18
> 453200.html Sent from the Hadoop core-user mailing list archive at 
> Nabble.com.
>

RE: How to chain multiple hadoop jobs?

Reply via email to