thanks guys ..your responses were very useful. -Taran
On Tue, Mar 4, 2008 at 5:56 PM, lohit <[EMAIL PROTECTED]> wrote: > Hi Tarandeep, > > the jobconf you get in your configure() method has the info > It is available via map.input.file parameter (more info here > http://wiki.apache.org/hadoop/TaskExecutionEnvironment) > > Yes, you can have multiple input directories. > You can use JobConf::addInputPath() to add more input paths before > submitting your job > more info here > > http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#addInputPath(org.apache.hadoop.fs.Path) > > Thanks, > Lohit > > > > ----- Original Message ---- > From: Tarandeep Singh <[EMAIL PROTECTED]> > To: [email protected] > Sent: Tuesday, March 4, 2008 5:38:41 PM > Subject: Processing multiple files - need to identify in map > > Hi, > > I need to identify from which file, a key came from, in the map phase. > Is it possible ? > > What I have is multiple types of log files in one directory that I > need to process for my application. Right now, I am relying on the > structure of the log files (e.g if a line starts with "weblog", the > line came from Log File A or if the number of tab-separated fields in > the line is N, then it is Log File B) > > Is there a better way to do this ? > > Is there a way that the Hadoop framework passes me as a key the path > of the file (right now it is the offset in the file, I guess) ? > > One more related question - can I set 2 directories as input to my map > reduce program ? This is just to avoid copying files from one log > directory to another. > > thanks, > Taran > > > >
