[ https://issues.apache.org/jira/browse/MAPREDUCE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867397#action_12867397 ]
luo Yi commented on MAPREDUCE-1743: ----------------------------------- the following code may get the true file name from the TaggedInputSplit. because TaggedInputSplit is a hadoop inner class ,you should make your class in the org.apache.hadoop.mapred.lib classspace: {code:title=TaggedInputSplitGetName.java|borderStyle=solid} InputSplit is = reporter.getInputSplit(); String name = is.getClass().getName(); if ( name.compareTo("org.apache.hadoop.mapred.FileSplit") == 0 ) { FileSplit fs = (FileSplit)is; String path = fs.getPath().toString(); word.set(path); output.collect(word, one); } if ( name.compareTo("org.apache.hadoop.mapred.lib.TaggedInputSplit") == 0 ) { TaggedInputSplit tis = (TaggedInputSplit)is; InputSplit iis = tis.getInputSplit(); String iname = iis.getClass().getName(); word.set(iname); output.collect(word, one); if ( iname.compareTo("org.apache.hadoop.mapred.FileSplit") == 0 ) { FileSplit fs = (FileSplit)iis; // the path from the TaggedInputSplit should be prefixed by "convert: " String path = "convert: " + fs.getPath().toString(); word.set(path); output.collect(word, one); } } and the output file give me : {noformat} $ grep 'convert' testout/part-00000 |head -n 5 convert: hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000000_0 1 convert: hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000001_0 1 convert: hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000002_0 1 convert: hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000003_0 1 convert: hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000004_0 1 {noformat} you may give it a try. {code} > conf.get("map.input.file") returns null when using MultipleInputs in Hadoop > 0.20 > -------------------------------------------------------------------------------- > > Key: MAPREDUCE-1743 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1743 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 0.20.2 > Reporter: Yuanyuan Tian > > There is a problem in getting the input file name in the mapper when uisng > MultipleInputs in Hadoop 0.20. I need to use MultipleInputs to support > different formats for my inputs to the my MapReduce job. And inside each > mapper, I also need to know the exact input file that the mapper is > processing. However, conf.get("map.input.file") returns null. Can anybody > help me solve this problem? Thanks in advance. > public class Test extends Configured implements Tool{ > static class InnerMapper extends MapReduceBase implements > Mapper<Writable, Writable, NullWritable, Text> > { > ................ > ................ > public void configure(JobConf conf) > { > String inputName=conf.get("map.input.file")); > ....................................... > } > > } > > public int run(String[] arg0) throws Exception { > JonConf job; > job = new JobConf(Test.class); > ........................................... > > MultipleInputs.addInputPath(conf, new Path("A"), > TextInputFormat.class); > MultipleInputs.addInputPath(conf, new Path("B"), > SequenceFileFormat.class); > ........................................... > } > } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.