more specifically, call
jobConf.get( "map.input.file" );

in the configure(JobConf conf) method of your mapper.

there are some cases this won't work. but in general it works fine.

and yes, you can add many input directories.

jobConf.addInputPath(...)

On Mar 4, 2008, at 5:54 PM, Ted Dunning wrote:


Yes.

Use the configure method which is called each time a new file is used in the
map.  Save the file name in a field of the mapper.


The other alternative is to derive a new InputFormat that remembers the
input file name.


On 3/4/08 5:38 PM, "Tarandeep Singh" <[EMAIL PROTECTED]> wrote:

Hi,

I need to identify from which file, a key came from, in the map phase.
Is it possible ?

What I have is multiple types of log files in one directory that I
need to process for my application. Right now, I am relying on the
structure of the log files (e.g if a line starts with "weblog", the
line came from Log File A or if the number of tab-separated fields in
the line is N, then it is Log File B)

Is there a better way to do this ?

Is there a way that the Hadoop framework passes me as a key the path
of the file (right now it is the offset in the file, I guess) ?

One more related question - can I set 2 directories as input to my map
reduce program ? This is just to avoid copying files from one log
directory to another.

thanks,
Taran


Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/



Reply via email to