Does anybody have the similar issue? If you store XML files in HDFS, how can 
you make sure a chunk reads by a mapper does not contain partial data of an XML 
segment?

For example:

<title>
<book>book1</book>
<author>me</author>
..............what if this is the boundary of a chunk?...................
<year>2009</year>
<book>book2</book>

<author>me</author>

<year>2009</year>
<book>book3</book>

<author>me</author>

<year>2009</year>
<title>



      


      

Reply via email to