Hey Vipul No I haven't concatenated my files yet, and I was just thinking over how to approach the issue of multiple input paths.
I actually did what Amandeep hinted at which was we wrote our own XMLInputFormat and XMLRecordReader. When configuring the job in my driver I set job.setInputFormatClass(XMLFileInputFormat.class) and what it does is send chunks of XML to the mapper as opposed to lines of text or whole files. So I specified the Line Delimiter in the XMLRecordReader (ie <startTag>) and everything in between the tags <startTag> and </startTag> are sent to the mapper. Inside the map function is where to parse the data and write it to the table. What I have to do now is just figure out how to set the Line Delimiter to be something common in both XML files I'm reading. Currently I have 2 mapper classes and thus 2 submitted jobs which is really inefficient and time consuming. Make sense at all? Sorry if it doesn't, feel free to ask more questions Mark -----Original Message----- From: Vipul Sharma [mailto:[email protected]] Sent: Monday, November 02, 2009 7:48 PM To: [email protected] Subject: RE: Multiple Input Paths Mark, were you able to concatenate both the xml files together. What did you do to keep the resulting xml well forned? Regards, Vipul Sharma, Cell: 281-217-0761
