Re: Processing multiple files - need to identify in map

lohit Tue, 04 Mar 2008 18:34:17 -0800

Hi Tarandeep,

the jobconf you get in your configure() method has the info 
It is available via  map.input.file parameter (more info here 
http://wiki.apache.org/hadoop/TaskExecutionEnvironment)


Yes, you can have multiple input directories.
You can use JobConf::addInputPath() to add more input paths before submitting 
your job
more info here
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#addInputPath(org.apache.hadoop.fs.Path)

Thanks,
Lohit

----- Original Message ----
From: Tarandeep Singh <[EMAIL PROTECTED]>
To: [email protected]
Sent: Tuesday, March 4, 2008 5:38:41 PM
Subject: Processing multiple files - need to identify in map

Hi,

I need to identify from which file, a key came from, in the map phase.
Is it possible ?

What I have is multiple types of log files in one directory that I
need to process for my application. Right now, I am relying on the
structure of the log files (e.g if a line starts with "weblog", the
line came from Log File A or if the number of tab-separated fields in
the line is N, then it is Log File B)

Is there a better way to do this ?

Is there a way that the Hadoop framework passes me as a key the path
of the file (right now it is the offset in the file, I guess) ?

One more related question - can I set 2 directories as input to my map
reduce program ? This is just to avoid copying files from one log
directory to another.

thanks,
Taran

Re: Processing multiple files - need to identify in map

Reply via email to