Hi Harsh, Thanks for the approach.One problem i am facing in implementing a custom InputFormat is :
I am trying to set the map.input.file property in the getRecordReader method in my InputFormat implementation with the following code snippet : FileSplit fs = (FileSplit) inputSplit; fileName= fs.getPath().getName; And the result i get is the path in which the files are placed and not the name of the file in that path. I was under the opinion that fs.getPath.getName would give me the name of the file in that input split based on this tutorial. h ttp://developer.yahoo.com/hadoop/tutorial/module4.html<http://developer.yahoo.com/hadoop/tutorial/module4.html> Is my understanding wrong? Please provide some pointers on the same. Thanks, Sahana On Fri, Sep 9, 2011 at 6:23 PM, Harsh J <ha...@cloudera.com> wrote: > Sahana, > > On Fri, Sep 9, 2011 at 5:31 PM, Sahana Bhat <sana.b...@gmail.com> wrote: > > Hi, > > > > I found this > > link https://issues.apache.org/jira/browse/MAPREDUCE-1743 related to the > > subject of my mail.Has this been resolved as yet or is there any > workaround > > to get the filename while using MultipleInputs? > > One workaround could be to pass your own InputFormat implementations, > whose RecordReaders set the "map.input.file" config property before > they begin reading. I'll take a look at that JIRA, meanwhile. > > > We have a restriction to use Hadoop 0.20.2 version as the 0.21.0 version > > release (says unstable,unsupported).Also MultipleInputs uses JobConf and > > hence i cannot get the context object to retrieve the filename :( . > > Yes, this is a genuine problem with 0.20.2. I'd reiterate that it is > better if one sticks to the stable API for the 0.20 lifetime. Don't > let the 'deprecated' markers fool you, cause they were 'undeprecated' > later. > > -- > Harsh J >