Hey Mike, There is a much easier way to do this. We've answered a very similar question in detail before at: http://search-hadoop.com/m/ZOmmJ1PZJqt1 (Question has a way for the stable/old API, and my response has the way for new API). Does this help?
On Thu, Jun 14, 2012 at 8:24 AM, Michael Parker <michael.g.par...@gmail.com> wrote: > Hi all, > > I'm new to Hadoop MR and decided to make a go at using only the new > API. I have a series of log files (who doesn't?), where a different > date is encoded in each filename. The log files are so few that I'm > not using HDFS. In my main method, I accept the input directory > containing all the log files as the first command line argument: > > Configuration conf = new Configuration(); > String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); > Path inputDir = new Path(otherArgs[0]); > ... > Job job1 = new Job(conf, "job1"); > FileInputFormat.addInputPath(job1, inputDir); > > I actually have two jobs chained using a JobControl, but I think > that's irrelevant. The problem is that the Mapper of this job cannot > get the filename by accessing key "mapred.input.file" of the Context > object that is either passed to the setup method of the mapper, or > available through the Context object in the call to map. Dumping the > configuration like so: > > StringWriter writer = new StringWriter(); > Configuration.dumpConfiguration(context.getConfiguration(), writer); > System.out.println("configuration=" + writer.toString()); > > Reveals that there is a "mapred.input.dir" key that contains the path > passed as a command line argument and assigned to inputDir in my main > method, but the processed filename within that path is still > inaccessible. Any ideas how to get this? > > Thanks, > Mike -- Harsh J