Alan,

On Mon, Dec 10, 2007 at 01:12:28AM -0800, Alan Ho wrote:
>I've written a xml input splitter based on a Stax parser. Its much better than 
>StreamXMLRecordReader
>

We'd definitely like to see something like this in Hadoop, do you mind 
contributing it?

Details: http://wiki.apache.org/lucene-hadoop/HowToContribute

thanks,
Arun

>----- Original Message ----
>From: Peter Thygesen <[EMAIL PROTECTED]>
>To: [email protected]
>Sent: Monday, November 26, 2007 8:49:52 AM
>Subject: MapReduce Job on XML input
>
>I would like to run some mapReduce jobs on some xml files I got (aprox.
>100000 compressed files). 
>The XML files are not that big about 1 Mb compressed, each containing
>about 1000 records. 
>
>Do I have to write my own InputSplitter? Should I use
>MultiFileInputFormat or StreamInputFormat? Can I use the
>StreamXmlRecordReader, and how? By sub-classing some input class?
>
>The tutorials and examples I've read are all very straight forward
>reading simple text files, but I miss a more complex example,
> especially
>one that reads xml files ;) 
>
>thx. 
>Peter
>
>
>
>
>
>
>
>      Looking for the perfect gift? Give the gift of Flickr! 
>
>http://www.flickr.com/gift/
>

Reply via email to