[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867397#action_12867397
 ] 

luo Yi commented on MAPREDUCE-1743:
-----------------------------------

the following code may get the true file name from the TaggedInputSplit. 
because TaggedInputSplit is a hadoop inner class ,you should make your class in 
the org.apache.hadoop.mapred.lib classspace:

{code:title=TaggedInputSplitGetName.java|borderStyle=solid}
InputSplit is = reporter.getInputSplit();
String name = is.getClass().getName();
if ( name.compareTo("org.apache.hadoop.mapred.FileSplit") == 0 ) {
    FileSplit fs = (FileSplit)is;
    String path = fs.getPath().toString();
    word.set(path);
    output.collect(word, one);
}
if ( name.compareTo("org.apache.hadoop.mapred.lib.TaggedInputSplit") == 0 ) {
    TaggedInputSplit tis = (TaggedInputSplit)is;
    InputSplit iis = tis.getInputSplit();
    String iname = iis.getClass().getName();
    word.set(iname);
    output.collect(word, one);
    if ( iname.compareTo("org.apache.hadoop.mapred.FileSplit") == 0 ) {
        FileSplit fs = (FileSplit)iis;
       // the path from the TaggedInputSplit should be prefixed by "convert: "
        String path = "convert: " + fs.getPath().toString();
        word.set(path);
        output.collect(word, one);
    }
}

and the output file give me : 

{noformat}
$ grep 'convert' testout/part-00000 |head -n 5
convert: 
hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000000_0    1
convert: 
hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000001_0    1
convert: 
hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000002_0    1
convert: 
hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000003_0    1
convert: 
hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000004_0    1
{noformat} 

you may give it a try.

{code} 

> conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 
> 0.20
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1743
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1743
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: Yuanyuan Tian
>
> There is a problem in getting the input file name in the mapper when uisng 
> MultipleInputs in Hadoop 0.20. I need to use MultipleInputs to support 
> different formats for my inputs to the my MapReduce job. And inside each 
> mapper, I also need to know the exact input file that the mapper is 
> processing. However, conf.get("map.input.file") returns null. Can anybody 
> help me solve this problem? Thanks in advance.
> public class Test extends Configured implements Tool{
>       static class InnerMapper extends MapReduceBase implements 
> Mapper<Writable, Writable, NullWritable, Text>
>       {
>               ................
>               ................
>               public void configure(JobConf conf)
>               {       
>                       String inputName=conf.get("map.input.file"));
>                       .......................................
>               }
>               
>       }
>       
>       public int run(String[] arg0) throws Exception {
>               JonConf job;
>               job = new JobConf(Test.class);
>               ...........................................
>               
>               MultipleInputs.addInputPath(conf, new Path("A"), 
> TextInputFormat.class);
>               MultipleInputs.addInputPath(conf, new Path("B"), 
> SequenceFileFormat.class);
>               ...........................................
>       }
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to