Re: Processing multiple files - need to identify in map

Chris K Wensel Tue, 04 Mar 2008 18:14:39 -0800

more specifically, call
jobConf.get( "map.input.file" );


in the configure(JobConf conf) method of your mapper.

there are some cases this won't work. but in general it works fine.

and yes, you can add many input directories.

jobConf.addInputPath(...)

On Mar 4, 2008, at 5:54 PM, Ted Dunning wrote:


Yes.

Use the configure method which is called each time a new file isused in the

map.  Save the file name in a field of the mapper.

The other alternative is to derive a new InputFormat that remembersthe

input file name.


On 3/4/08 5:38 PM, "Tarandeep Singh" <[EMAIL PROTECTED]> wrote:

Hi,

I need to identify from which file, a key came from, in the mapphase.

Is it possible ?

What I have is multiple types of log files in one directory that I
need to process for my application. Right now, I am relying on the
structure of the log files (e.g if a line starts with "weblog", the
line came from Log File A or if the number of tab-separated fields in
the line is N, then it is Log File B)

Is there a better way to do this ?

Is there a way that the Hadoop framework passes me as a key the path
of the file (right now it is the offset in the file, I guess) ?

One more related question - can I set 2 directories as input to mymap

reduce program ? This is just to avoid copying files from one log
directory to another.

thanks,
Taran


Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/

Re: Processing multiple files - need to identify in map

Reply via email to