Here is some more complete sample code that is based on my own MapReduce jobs.
//import lots of things
public class MyMapReduceTool extends Configured implements Tool {
public int run(String[] args) throws Exception {
JobConf conf = new JobConf(getConf(), MyMapReduceTool.class);
conf.setJobName("SomeName");
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(MapClass.class);
conf.setReducerClass(Reduce.class);
//basically i use only sequence files for i/o in most of my jobs
conf.setInputFormat(SequenceFileInputFormat.class);
conf.setCompressMapOutput(true);
conf.setMapOutputCompressionType(CompressionType.BLOCK);
conf.setOutputFormat(SequenceFileOutputFormat.class);
SequenceFileOutputFormat.setCompressOutput(conf, true);
SequenceFileOutputFormat.setOutputCompressionType(conf,
CompressionType.BLOCK);
//args parsing
Path in = new Path(args[0]);
Path out = new Path(args[1]);
conf.setInputPath(in);
conf.setOutputPath(out)
//any other config things you might want to do
JobClient.runJob(conf);
return 0;
}
public static class MapClass extends MapReduceBase implements
Mapper<Text, Text, Text, Text> {
public void configure(JobConf job) { //optional method
//stuff goes here
}
public void map(Text key, Text value, OutputCollector<Text,
Text>
output, Reporter reporter) throws IOException {
//some stuff here
}
public void close() { //optional method
//some stuff here
}
}
public static class Reduce extends MapReduceBase implements
Reducer<Text, Text, Text, Text> {
public void configure(JobConf job) { //optional method
//stuff goes here
}
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output, Reporter reporter) throws
IOException {
//stuff goes here
}
public void close() { //this method is optional
//stuff goes here
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new
MyMapReduceTool(),
new String[]{some, arguments});
System.exit(res);
}
}
Joman Chu
AIM: ARcanUSNUMquam
IRC: irc.liquid-silver.net
On Mon, Jul 14, 2008 at 5:46 PM, Joman Chu <[EMAIL PROTECTED]> wrote:
> Hi, I don't have the code sitting in front of me at the moment, but
> I'll do some of it from memory and I'll post a real snippet tomorrow
> night. Hopefully, this can get you started
>
> public class MyMainClass {
> public static void main(String[] args) {
> ToolRunner.run(new Configuration(), new
> ClassThatImplementsTool(), args);
> //make sure you see the API for other trickiness you can do.
> }
> }
>
> public class ClassThatImplementsTool implements Tool {
> public int run(String[] args) {
> //this method gets called by ToolRunner.run
> //do all sorts of configuration here
> //ie, set your Map, Combine, Reduce class
> //look at the Configuration class API
> }
> }
>
> The main think to know is that the ToolRunner.run() will call your
> class's run() method.
>
> Joman Chu
> AIM: ARcanUSNUMquam
> IRC: irc.liquid-silver.net
>
>
> On Mon, Jul 14, 2008 at 4:38 PM, Sean Arietta <[EMAIL PROTECTED]> wrote:
>>
>> Could you please provide some small code snippets elaborating on how you
>> implemented that? I have a similar need as the author of this thread and I
>> would appreciate any help. Thanks!
>>
>> Cheers,
>> Sean
>>
>>
>> Joman Chu-2 wrote:
>>>
>>> Hi, I use Toolrunner.run() for multiple MapReduce jobs. It seems to work
>>> well. I've run sequences involving hundreds of MapReduce jobs in a for
>>> loop and it hasn't died on me yet.
>>>
>>> On Wed, July 9, 2008 4:28 pm, Mori Bellamy said:
>>>> Hey all, I'm trying to chain multiple mapreduce jobs together to
>>>> accomplish a complex task. I believe that the way to do it is as follows:
>>>>
>>>> JobConf conf = new JobConf(getConf(), MyClass.class); //configure job....
>>>> set mappers, reducers, etc
>>>> SequenceFileOutputFormat.setOutputPath(conf,myPath1);
>>>> JobClient.runJob(conf);
>>>>
>>>> //new job JobConf conf2 = new JobConf(getConf(),MyClass.class)
>>>> SequenceFileInputFormat.setInputPath(conf,myPath1); //more
>>>> configuration... JobClient.runJob(conf2)
>>>>
>>>> Is this the canonical way to chain jobs? I'm having some trouble with
>>>> this
>>>> method -- for especially long jobs, the latter MR tasks sometimes do not
>>>> start up.
>>>>
>>>>
>>>
>>>
>>> --
>>> Joman Chu
>>> AIM: ARcanUSNUMquam
>>> IRC: irc.liquid-silver.net
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/How-to-chain-multiple-hadoop-jobs--tp18370089p18452309.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>>
>