Hey Steve, Look at the mailing list archives - there's a specialized input splitter that you could use that at least 2 different people suggested.
Brian On Nov 16, 2009, at 2:02 PM, Steve Gao wrote: > Thanks. But this is not a neat solution in case that the XML block is very > large. > Anybody has another solution? Thanks! > > --- On Thu, 10/29/09, Amandeep Khurana <[email protected]> wrote: > > From: Amandeep Khurana <[email protected]> > Subject: Re: What if an XML file is accross boundary of HDFS chunks? > To: [email protected] > Date: Thursday, October 29, 2009, 5:12 PM > > Store the entire xml in one line... > > On 10/29/09, Steve Gao <[email protected]> wrote: >> Does anybody have the similar issue? If you store XML files in HDFS, how can >> you make sure a chunk reads by a mapper does not contain partical data of an >> XML segment? >> >> For example: >> >> <title> >> <book>book1</book> >> <author>me</author> >> ..............what if this is the boundary of a chunk?................... >> <year>2009</year> >> <book>book2</book> >> >> <author>me</author> >> >> <year>2009</year> >> <book>book3</book> >> >> <author>me</author> >> >> <year>2009</year> >> <title> >> >> >> >> > > > -- > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > >
smime.p7s
Description: S/MIME cryptographic signature
