RE: How to chain multiple hadoop jobs?

Sean Arietta Tue, 15 Jul 2008 07:23:45 -0700

Thanks for all of the help... Here is what I am working with:

1. I do use Eclipse to run the jar... There is an option in the Hadoop
plugin for Eclipse to run applications, so maybe that is causing the problem
2. I am not really updating any Hadoop conf params... Here is what I am
doing:


class TestDriver extends Configured implements Tool {

public static JobConf conf;

public int run(String[] args) {
               JobClient client = new JobClient();

                client.setConf(conf);
                while(blah blah) {
                              try 
                                {
                                        JobClient.runJob(conf);
                                } catch (Exception e) 
                                {
                                        e.printStackTrace();
                                }
                }
                return 1;
}

public static void main(String[] args) {
               conf = new JobConf(myclass.class);

                // Set output formats
                conf.setOutputKeyClass(FloatWritable.class);
                conf.setOutputValueClass(LongWritable.class);
                
                // Set input format
                conf.setInputFormat(org.superres.TrainingInputFormat.class);

                Path output_path = new Path("out");
                FileOutputFormat.setOutputPath(conf, output_path);

                // Set input path
                TrainingInputFormat.setInputPaths(conf, new Path("input"));

                // Setup Hadoop classes to be used
                conf.setMapperClass(org.superres.TestMap.class);
                conf.setCombinerClass(org.superres.TestReduce.class);
                conf.setReducerClass(org.superres.TestReduce.class);
                
                ToolRunner.run(conf, new TestDriver(), args);
}
}

So yes the main method is in the class used to "drive" the Hadoop program,
but as far as modifying the configuration, I don't think I am doing that
because it is actually set up in the main method.

Cheers,
Sean



Goel, Ankur wrote:
> 
> Hadoop typically complains if you try to re-use a JobConf object by
> modifing job parameters (Mapper, Reducer, output path etc..) and
> re-submitting it to the job client. You should be creating a new JobConf
> object for every map-reduce job and if there are some parameters that
> should be copied from previous job, then you should be doing 
> 
> JobConf newJob = new JobConf(oldJob, MyClass.class);
> ...(your changes to newJob) ...
> JobClient.runJob(newJob)
> 
> This works for me.
> 
> -----Original Message-----
> From: Mori Bellamy [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, July 15, 2008 4:27 AM
> To: [email protected]
> Subject: Re: How to chain multiple hadoop jobs?
> 
> Weird. I use eclipse, but that's never happened to me. When  you set up
> your JobConfs, for example:
> JobConf conf2 = new JobConf(getConf(),MyClass.class) is your "MyClass"
> in the same package as your driver program? also, do you run from
> eclipse or from the command line (i've never tried to launch a hadoop
> task from eclipse). if you run from the command line:
> 
> hadoop jar MyMRTaskWrapper.jar myEntryClass option1 option2...
> 
> and all of the requisite resources are in MyMRTaskWrapper.jar, i don't
> see what the problem would be. if this is the way you run a hadoop task,
> are you sure that all of the resources are getting compiled into the
> same jar? when you export a jar from eclipse, it won't pack up external
> resources by default. (look into addons like FatJAR for that).
> 
> 
> On Jul 14, 2008, at 2:25 PM, Sean Arietta wrote:
> 
>>
>> Well that's what I need to do also... but Hadoop complains to me when 
>> I attempt to do that. Are you using Eclipse by any chance to develop?
>> The
>> error I'm getting seems to be stemming from the fact that Hadoop 
>> thinks I am uploading a new jar for EVERY execution of 
>> JobClient.runJob() so it fails indicating the job jar file doesn't 
>> exist. Did you have to turn something on/off to get it to ignore that 
>> or are you using a different IDE?
>> Thanks!
>>
>> Cheers,
>> Sean
>>
>>
>> Mori Bellamy wrote:
>>>
>>> hey sean,
>>>
>>> i later learned that the method i originally posted (configuring 
>>> different JobConfs and then running them, blocking style, with
>>> JobClient.runJob(conf)) was sufficient for my needs. the reason it 
>>> was failing before was somehow my fault and the bugs somehow got 
>>> fixed x_X.
>>>
>>> Lukas gave me a helpful reply pointing me to TestJobControl.java (in 
>>> the hadoop source directory). it seems like this would be helpful if 
>>> your job dependencies are complex. but for me, i just need to do one 
>>> job after another (and every job only depends on the one right before
> 
>>> it), so the code i originally posted works fine.
>>
>> --
>> View this message in context: 
>> http://www.nabble.com/How-to-chain-multiple-hadoop-jobs--tp18370089p18
>> 453200.html Sent from the Hadoop core-user mailing list archive at 
>> Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-chain-multiple-hadoop-jobs--tp18370089p18466505.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

RE: How to chain multiple hadoop jobs?

Reply via email to