Amogh,

That sounds so awesome! Yeah I wish I had that class now. Do you have any tips 
on how to create such a delegating class? The best I can come up with is to 
just submit both files to the mapper using multiple input paths and then having 
anif statement at the beginning of the map that checks which file it's dealing 
with but I'm skeptical that I can even make that work... Is there a way you 
know of that I could submit 2 mapper classes to the job?

-----Original Message-----
From: Amogh Vasekar [mailto:[email protected]] 
Sent: Wednesday, November 04, 2009 1:50 AM
To: [email protected]
Subject: Re: Multiple Input Paths

Hi Mark,
A future release of Hadoop will have a MultipleInputs class, akin to 
MultipleOutputs. This would allow you to have a different inputformat, mapper 
depending on the path you are getting the split from. It uses special 
Delegating[mapper/input] classes to resolve this. I understand backporting this 
is more or less out of question, but the ideas there might provide pointers to 
help you solve your current problem.
Just a thought :)

Amogh


On 11/3/09 8:44 PM, "Mark Vigeant" <[email protected]> wrote:

Hey Vipul

No I haven't concatenated my files yet, and I was just thinking over how to 
approach the issue of multiple input paths.

I actually did what Amandeep hinted at which was we wrote our own 
XMLInputFormat and XMLRecordReader. When configuring the job in my driver I set 
job.setInputFormatClass(XMLFileInputFormat.class) and what it does is send 
chunks of XML to the mapper as opposed to lines of text or whole files. So I 
specified the Line Delimiter in the XMLRecordReader (ie <startTag>) and 
everything in between the tags <startTag> and </startTag> are sent to the 
mapper. Inside the map function is where to parse the data and write it to the 
table.

What I have to do now is just figure out how to set the Line Delimiter to be 
something common in both XML files I'm reading. Currently I have 2 mapper 
classes and thus 2 submitted jobs which is really inefficient and time 
consuming.

Make sense at all? Sorry if it doesn't, feel free to ask more questions

Mark

-----Original Message-----
From: Vipul Sharma [mailto:[email protected]]
Sent: Monday, November 02, 2009 7:48 PM
To: [email protected]
Subject: RE: Multiple Input Paths

Mark,

were you able to concatenate both the xml files together. What did you do to
keep the resulting xml well forned?

Regards,
Vipul Sharma,
Cell: 281-217-0761

Reply via email to