I've written a xml input splitter based on a Stax parser. Its much better than 
StreamXMLRecordReader

----- Original Message ----
From: Peter Thygesen <[EMAIL PROTECTED]>
To: hadoop-user@lucene.apache.org
Sent: Monday, November 26, 2007 8:49:52 AM
Subject: MapReduce Job on XML input

I would like to run some mapReduce jobs on some xml files I got (aprox.
100000 compressed files). 
The XML files are not that big about 1 Mb compressed, each containing
about 1000 records. 

Do I have to write my own InputSplitter? Should I use
MultiFileInputFormat or StreamInputFormat? Can I use the
StreamXmlRecordReader, and how? By sub-classing some input class?

The tutorials and examples I've read are all very straight forward
reading simple text files, but I miss a more complex example,
 especially
one that reads xml files ;) 

thx. 
Peter







      Looking for the perfect gift? Give the gift of Flickr! 

http://www.flickr.com/gift/

Reply via email to