the Reporter object given to the map() method can get you the InputSplit
that is being mapped over. If this subclasses FileInputSplit, you can
grab the path name from there.
- Aaron
Tarandeep Singh wrote:
Hi,
I need to identify from which file, a key came from, in the map phase.
Is it possible ?
What I have is multiple types of log files in one directory that I
need to process for my application. Right now, I am relying on the
structure of the log files (e.g if a line starts with "weblog", the
line came from Log File A or if the number of tab-separated fields in
the line is N, then it is Log File B)
Is there a better way to do this ?
Is there a way that the Hadoop framework passes me as a key the path
of the file (right now it is the offset in the file, I guess) ?
One more related question - can I set 2 directories as input to my map
reduce program ? This is just to avoid copying files from one log
directory to another.
thanks,
Taran