Thanks. But this is not a neat solution in case that the XML block is very 
large.
Anybody has another solution? Thanks!

--- On Thu, 10/29/09, Amandeep Khurana <[email protected]> wrote:

From: Amandeep Khurana <[email protected]>
Subject: Re: What if an XML file is accross boundary of HDFS chunks?
To: [email protected]
Date: Thursday, October 29, 2009, 5:12 PM

Store the entire xml in one line...

On 10/29/09, Steve Gao <[email protected]> wrote:
> Does anybody have the similar issue? If you store XML files in HDFS, how can
> you make sure a chunk reads by a mapper does not contain partical data of an
> XML segment?
>
> For example:
>
> <title>
> <book>book1</book>
> <author>me</author>
> ..............what if this is the boundary of a chunk?...................
> <year>2009</year>
> <book>book2</book>
>
> <author>me</author>
>
> <year>2009</year>
> <book>book3</book>
>
> <author>me</author>
>
> <year>2009</year>
> <title>
>
>
>
>


-- 


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz



      

Reply via email to