Re: Processing multiple files - need to identify in map

Tarandeep Singh Wed, 05 Mar 2008 18:15:04 -0800

thanks guys ..your responses were very useful.

-Taran


On Tue, Mar 4, 2008 at 5:56 PM, lohit <[EMAIL PROTECTED]> wrote:
> Hi Tarandeep,
>
>  the jobconf you get in your configure() method has the info
>  It is available via  map.input.file parameter (more info here 
> http://wiki.apache.org/hadoop/TaskExecutionEnvironment)
>
>  Yes, you can have multiple input directories.
>  You can use JobConf::addInputPath() to add more input paths before 
> submitting your job
>  more info here
>  
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#addInputPath(org.apache.hadoop.fs.Path)
>
>  Thanks,
>  Lohit
>
>
>
>  ----- Original Message ----
>  From: Tarandeep Singh <[EMAIL PROTECTED]>
>  To: [email protected]
>  Sent: Tuesday, March 4, 2008 5:38:41 PM
>  Subject: Processing multiple files - need to identify in map
>
>  Hi,
>
>  I need to identify from which file, a key came from, in the map phase.
>  Is it possible ?
>
>  What I have is multiple types of log files in one directory that I
>  need to process for my application. Right now, I am relying on the
>  structure of the log files (e.g if a line starts with "weblog", the
>  line came from Log File A or if the number of tab-separated fields in
>  the line is N, then it is Log File B)
>
>  Is there a better way to do this ?
>
>  Is there a way that the Hadoop framework passes me as a key the path
>  of the file (right now it is the offset in the file, I guess) ?
>
>  One more related question - can I set 2 directories as input to my map
>  reduce program ? This is just to avoid copying files from one log
>  directory to another.
>
>  thanks,
>  Taran
>
>
>
>

Re: Processing multiple files - need to identify in map

Reply via email to