Hey Steve,

Look at the mailing list archives - there's a specialized input splitter that 
you could use that at least 2 different people suggested.

Brian

On Nov 16, 2009, at 2:02 PM, Steve Gao wrote:

> Thanks. But this is not a neat solution in case that the XML block is very 
> large.
> Anybody has another solution? Thanks!
> 
> --- On Thu, 10/29/09, Amandeep Khurana <[email protected]> wrote:
> 
> From: Amandeep Khurana <[email protected]>
> Subject: Re: What if an XML file is accross boundary of HDFS chunks?
> To: [email protected]
> Date: Thursday, October 29, 2009, 5:12 PM
> 
> Store the entire xml in one line...
> 
> On 10/29/09, Steve Gao <[email protected]> wrote:
>> Does anybody have the similar issue? If you store XML files in HDFS, how can
>> you make sure a chunk reads by a mapper does not contain partical data of an
>> XML segment?
>> 
>> For example:
>> 
>> <title>
>> <book>book1</book>
>> <author>me</author>
>> ..............what if this is the boundary of a chunk?...................
>> <year>2009</year>
>> <book>book2</book>
>> 
>> <author>me</author>
>> 
>> <year>2009</year>
>> <book>book3</book>
>> 
>> <author>me</author>
>> 
>> <year>2009</year>
>> <title>
>> 
>> 
>> 
>> 
> 
> 
> -- 
> 
> 
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
> 
> 
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to