Thanks for all of the help... Here is what I am working with:
1. I do use Eclipse to run the jar... There is an option in the Hadoop
plugin for Eclipse to run applications, so maybe that is causing the problem
2. I am not really updating any Hadoop conf params... Here is what I am
doing:
class TestDriver extends Configured implements Tool {
public static JobConf conf;
public int run(String[] args) {
JobClient client = new JobClient();
client.setConf(conf);
while(blah blah) {
try
{
JobClient.runJob(conf);
} catch (Exception e)
{
e.printStackTrace();
}
}
return 1;
}
public static void main(String[] args) {
conf = new JobConf(myclass.class);
// Set output formats
conf.setOutputKeyClass(FloatWritable.class);
conf.setOutputValueClass(LongWritable.class);
// Set input format
conf.setInputFormat(org.superres.TrainingInputFormat.class);
Path output_path = new Path("out");
FileOutputFormat.setOutputPath(conf, output_path);
// Set input path
TrainingInputFormat.setInputPaths(conf, new Path("input"));
// Setup Hadoop classes to be used
conf.setMapperClass(org.superres.TestMap.class);
conf.setCombinerClass(org.superres.TestReduce.class);
conf.setReducerClass(org.superres.TestReduce.class);
ToolRunner.run(conf, new TestDriver(), args);
}
}
So yes the main method is in the class used to "drive" the Hadoop program,
but as far as modifying the configuration, I don't think I am doing that
because it is actually set up in the main method.
Cheers,
Sean
Goel, Ankur wrote:
>
> Hadoop typically complains if you try to re-use a JobConf object by
> modifing job parameters (Mapper, Reducer, output path etc..) and
> re-submitting it to the job client. You should be creating a new JobConf
> object for every map-reduce job and if there are some parameters that
> should be copied from previous job, then you should be doing
>
> JobConf newJob = new JobConf(oldJob, MyClass.class);
> ...(your changes to newJob) ...
> JobClient.runJob(newJob)
>
> This works for me.
>
> -----Original Message-----
> From: Mori Bellamy [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, July 15, 2008 4:27 AM
> To: [email protected]
> Subject: Re: How to chain multiple hadoop jobs?
>
> Weird. I use eclipse, but that's never happened to me. When you set up
> your JobConfs, for example:
> JobConf conf2 = new JobConf(getConf(),MyClass.class) is your "MyClass"
> in the same package as your driver program? also, do you run from
> eclipse or from the command line (i've never tried to launch a hadoop
> task from eclipse). if you run from the command line:
>
> hadoop jar MyMRTaskWrapper.jar myEntryClass option1 option2...
>
> and all of the requisite resources are in MyMRTaskWrapper.jar, i don't
> see what the problem would be. if this is the way you run a hadoop task,
> are you sure that all of the resources are getting compiled into the
> same jar? when you export a jar from eclipse, it won't pack up external
> resources by default. (look into addons like FatJAR for that).
>
>
> On Jul 14, 2008, at 2:25 PM, Sean Arietta wrote:
>
>>
>> Well that's what I need to do also... but Hadoop complains to me when
>> I attempt to do that. Are you using Eclipse by any chance to develop?
>> The
>> error I'm getting seems to be stemming from the fact that Hadoop
>> thinks I am uploading a new jar for EVERY execution of
>> JobClient.runJob() so it fails indicating the job jar file doesn't
>> exist. Did you have to turn something on/off to get it to ignore that
>> or are you using a different IDE?
>> Thanks!
>>
>> Cheers,
>> Sean
>>
>>
>> Mori Bellamy wrote:
>>>
>>> hey sean,
>>>
>>> i later learned that the method i originally posted (configuring
>>> different JobConfs and then running them, blocking style, with
>>> JobClient.runJob(conf)) was sufficient for my needs. the reason it
>>> was failing before was somehow my fault and the bugs somehow got
>>> fixed x_X.
>>>
>>> Lukas gave me a helpful reply pointing me to TestJobControl.java (in
>>> the hadoop source directory). it seems like this would be helpful if
>>> your job dependencies are complex. but for me, i just need to do one
>>> job after another (and every job only depends on the one right before
>
>>> it), so the code i originally posted works fine.
>>
>> --
>> View this message in context:
>> http://www.nabble.com/How-to-chain-multiple-hadoop-jobs--tp18370089p18
>> 453200.html Sent from the Hadoop core-user mailing list archive at
>> Nabble.com.
>>
>
>
>
--
View this message in context:
http://www.nabble.com/How-to-chain-multiple-hadoop-jobs--tp18370089p18466505.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.