I'm trying to get the name of the file that the map job is operating on out of the Context passed to the setup function. It's proving harder than seems proper.
I've found several links via google on this topic, but I've seen no responses to previous questions. We have this from July 17, 2009: http://www.mail-archive.com/[email protected]/msg00535.html I attempted that solution and javac complained about using a deprecated API. It's very clearly spelled out in this doc: http://hadoop.apache.org/common/docs/r0.20.1/mapred_tutorial.html and yet the example source code for 20.1 is still using the mapred.* (deprecated) API that the prior link used as well. For the record, here's what I've tried, in the hopes that someone will just paste back a working solution: import java.io.IOException; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.util.GenericOptionsParser; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.RecordWriter; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer; import org.apache.hadoop.mapred.FileSplit; public class Foo { public static class FooMapper extends Mapper<Object, Text, Text, IntWritable> { private org.apache.hadoop.io.Text input_file; public void setup (Context context) { Configuration conf = context.getConfiguration(); // // fails to compile due to use of deprecated mapred API: // FileSplit fileSplit = (FileSplit)context.getInputSplit(); String input_fname = fileSplit.getPath().toString(); input_file.set(input_fname); // // results in null pointer exception because conf.get returns null: // // input_file.set(conf.get("map.input.file")); } } }
