Hey Steve,

I think I've run across code in SVN that is a splitter for XML entries like this. Look at StreamXmlRecordReader, I think it does what you want.

Brian

On Oct 29, 2009, at 4:12 PM, Amandeep Khurana wrote:

Store the entire xml in one line...

On 10/29/09, Steve Gao <[email protected]> wrote:
Does anybody have the similar issue? If you store XML files in HDFS, how can you make sure a chunk reads by a mapper does not contain partical data of an
XML segment?

For example:

<title>
<book>book1</book>
<author>me</author>
..............what if this is the boundary of a chunk?...................
<year>2009</year>
<book>book2</book>

<author>me</author>

<year>2009</year>
<book>book3</book>

<author>me</author>

<year>2009</year>
<title>






--


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to