Re: how to pass arguments to a map reduce job

David Hawthorne Wed, 10 Feb 2010 12:20:21 -0800

Thanks, that worked!

On Feb 10, 2010, at 11:44 AM, Alex Kozlov wrote:

David, to parse the -Dkey=value flags you need to implement Tool.

Otherwise, you can just set the values yourself using conf.set(name,value)

call.

On Wed, Feb 10, 2010 at 11:25 AM, David Hawthorne<[email protected]> wrote:

For the other method I was using, with otherArgs and public static

variables for field_name and interval_length, here's the code forthat:



public class FooBar {

public static class FooMapper extends Mapper<Object, Text,Text,

IntWritable> {

private final static IntWritable one = new IntWritable(1);


              public static Integer interval_length;
              public static String field_name;

              ...

      }

      public static void main(String[] args) throws Exception {
              Configuration conf = new Configuration();
              String[] otherArgs = new GenericOptionsParser(conf,
args).getRemainingArgs();
              if (otherArgs.length != 4) {
                      System.err.println("Usage: wordcount <in> <out>
<field name> <interval>");
                      System.exit(2);
              }
              ...

FileInputFormat.addInputPath(job, new Path(otherArgs[0]));FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));


              FooMapper.field_name = otherArgs[2];
              FooMapper.interval_length = new Integer(otherArgs[3]);


              System.exit(job.waitForCompletion(true) ? 0 : 1);
      }
}


On Feb 10, 2010, at 11:08 AM, David Hawthorne wrote:

I've tried what it shows in the examples, but those don't seem to be

working. Aside from that, they also complain about deprecatedinterfacewhen I compile. Any help you guys can give would be greatlyappreciated.


Here's what I need to do in the mapper:

Read through some logs.
Modulo the timestamp by some integer N, which I want to pass on the
command line.
Pull out some field M, which I want to pass on the command line.

If I hard code N and M, the job completes just fine and there'sproper

content in the output path.

If I pass N and M on the command line as otherArgs or as -D a=N -Db=M andattempt to assign these to the jobConf with configure(), the jobcompletessuccessfully but with an empty output file. This tells me thatit's notpulling the proper field (M) out of the log line. When passing asotherArgs

and parsing in main(), I was using public static variables for

interval_length and field_name, but obviously that didn't workeither and I

don't know why.

How do I do this?

If it can't be done, I'll just have to write a bunch of Mapperclasses foreach combination of N and M, and that seems to be unnecessarilyverbose to

me.

The code:

import java.io.IOException;

import org.json.*;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class FooBar {

public static class FooMapper extends Mapper<Object, Text,Text,

IntWritable> {

private final static IntWritable one = new IntWritable(1);


             private Integer interval_length;
             private String field_name;

             public void configure(JobConf job) {
                     interval_length = new
Integer(job.get("foo.interval_length"));
                     field_name = job.get("foo.field_name");
             }

public void map(Object key, Text value, Contextcontext)

throws IOException, InterruptedException {
                     try {
                             JSONObject j = new
JSONObject(value.toString());
                             int t = j.getInt("timestamp");
                                 t = t % interval_length;
                             Text k = new Text(t.toString() + ":" +
j.getString(field_name));
                             context.write(k, one);
                     } catch (Exception e) {
                              //
                              // how to do something useful with
exceptions in Hadoop?
                              //
                     }
             }
     }

      //
      // main was taken from the WordCount.java example
      //

     public static void main(String[] args) throws Exception {
             Configuration conf = new Configuration();
             String[] otherArgs = new GenericOptionsParser(conf,
args).getRemainingArgs();
             if (otherArgs.length != 2) {

System.err.println("Usage: wordcount <in><out>");

                     System.exit(2);
             }
             Job job = new Job(conf, "wordcount");
             job.setJarByClass(FooBar.class);
             job.setMapperClass(FooMapper.class);
             job.setCombinerClass(IntSumReducer.class);
             job.setReducerClass(IntSumReducer.class);
             job.setOutputKeyClass(Text.class);
             job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(otherArgs[0]));FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

             System.exit(job.waitForCompletion(true) ? 0 : 1);
     }
}

Re: how to pass arguments to a map reduce job

Reply via email to