Here is some more complete sample code that is based on my own MapReduce jobs.

//import lots of things

public class MyMapReduceTool extends Configured implements Tool {
        public int run(String[] args) throws Exception {
                JobConf conf = new JobConf(getConf(), MyMapReduceTool.class);
                conf.setJobName("SomeName");

                conf.setMapOutputKeyClass(Text.class);
                conf.setMapOutputValueClass(Text.class);

                conf.setOutputKeyClass(Text.class);
                conf.setOutputValueClass(Text.class);

                conf.setMapperClass(MapClass.class);
                conf.setReducerClass(Reduce.class);

                //basically i use only sequence files for i/o in most of my jobs
                conf.setInputFormat(SequenceFileInputFormat.class);
                conf.setCompressMapOutput(true);
                conf.setMapOutputCompressionType(CompressionType.BLOCK);
                conf.setOutputFormat(SequenceFileOutputFormat.class);
                SequenceFileOutputFormat.setCompressOutput(conf, true);
                SequenceFileOutputFormat.setOutputCompressionType(conf,
CompressionType.BLOCK);

                //args parsing
                Path in = new Path(args[0]);
                Path out = new Path(args[1]);
                conf.setInputPath(in);
                conf.setOutputPath(out)

                //any other config things you might want to do

                JobClient.runJob(conf);
                return 0;
        }

        public static class MapClass extends MapReduceBase implements
Mapper<Text, Text, Text, Text> {
                public void configure(JobConf job) { //optional method
                        //stuff goes here
                }
                public void map(Text key, Text value, OutputCollector<Text, 
Text>
output, Reporter reporter) throws IOException {
                        //some stuff here
                }
                public void close() { //optional method
                        //some stuff here
                }
        }

        public static class Reduce extends MapReduceBase implements
Reducer<Text, Text, Text, Text> {
                public void configure(JobConf job) { //optional method
                        //stuff goes here
                }
                public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output, Reporter reporter) throws
IOException {
                        //stuff goes here
                }
                public void close() { //this method is optional
                        //stuff goes here
                }

        public static void main(String[] args) throws Exception {
                int res = ToolRunner.run(new Configuration(), new 
MyMapReduceTool(),
new String[]{some, arguments});
                System.exit(res);
        }
}

Joman Chu
AIM: ARcanUSNUMquam
IRC: irc.liquid-silver.net


On Mon, Jul 14, 2008 at 5:46 PM, Joman Chu <[EMAIL PROTECTED]> wrote:
> Hi, I don't have the code sitting in front of me at the moment, but
> I'll do some of it from memory and I'll post a real snippet tomorrow
> night. Hopefully, this can get you started
>
> public class MyMainClass {
>        public static void main(String[] args) {
>                ToolRunner.run(new Configuration(), new 
> ClassThatImplementsTool(), args);
>                //make sure you see the API for other trickiness you can do.
>        }
> }
>
> public class ClassThatImplementsTool implements Tool {
>        public int run(String[] args) {
>                //this method gets called by ToolRunner.run
>                //do all sorts of configuration here
>                //ie, set your Map, Combine, Reduce class
>                //look at the Configuration class API
>        }
> }
>
> The main think to know is that the ToolRunner.run() will call your
> class's run() method.
>
> Joman Chu
> AIM: ARcanUSNUMquam
> IRC: irc.liquid-silver.net
>
>
> On Mon, Jul 14, 2008 at 4:38 PM, Sean Arietta <[EMAIL PROTECTED]> wrote:
>>
>> Could you please provide some small code snippets elaborating on how you
>> implemented that? I have a similar need as the author of this thread and I
>> would appreciate any help. Thanks!
>>
>> Cheers,
>> Sean
>>
>>
>> Joman Chu-2 wrote:
>>>
>>> Hi, I use Toolrunner.run() for multiple MapReduce jobs. It seems to work
>>> well. I've run sequences involving hundreds of MapReduce jobs in a for
>>> loop and it hasn't died on me yet.
>>>
>>> On Wed, July 9, 2008 4:28 pm, Mori Bellamy said:
>>>> Hey all, I'm trying to chain multiple mapreduce jobs together to
>>>> accomplish a complex task. I believe that the way to do it is as follows:
>>>>
>>>> JobConf conf = new JobConf(getConf(), MyClass.class); //configure job....
>>>> set mappers, reducers, etc
>>>> SequenceFileOutputFormat.setOutputPath(conf,myPath1);
>>>> JobClient.runJob(conf);
>>>>
>>>> //new job JobConf conf2 = new JobConf(getConf(),MyClass.class)
>>>> SequenceFileInputFormat.setInputPath(conf,myPath1); //more
>>>> configuration... JobClient.runJob(conf2)
>>>>
>>>> Is this the canonical way to chain jobs? I'm having some trouble with
>>>> this
>>>> method -- for especially long jobs, the latter MR tasks sometimes do not
>>>> start up.
>>>>
>>>>
>>>
>>>
>>> --
>>> Joman Chu
>>> AIM: ARcanUSNUMquam
>>> IRC: irc.liquid-silver.net
>>>
>>>
>>>
>>
>> --
>> View this message in context: 
>> http://www.nabble.com/How-to-chain-multiple-hadoop-jobs--tp18370089p18452309.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>>
>

Reply via email to