Hey Steve,I think I've run across code in SVN that is a splitter for XML entries like this. Look at StreamXmlRecordReader, I think it does what you want.
Brian On Oct 29, 2009, at 4:12 PM, Amandeep Khurana wrote:
Store the entire xml in one line... On 10/29/09, Steve Gao <[email protected]> wrote:Does anybody have the similar issue? If you store XML files in HDFS, how can you make sure a chunk reads by a mapper does not contain partical data of anXML segment? For example: <title> <book>book1</book> <author>me</author>..............what if this is the boundary of a chunk?...................<year>2009</year> <book>book2</book> <author>me</author> <year>2009</year> <book>book3</book> <author>me</author> <year>2009</year> <title>-- Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
smime.p7s
Description: S/MIME cryptographic signature
