I've written a xml input splitter based on a Stax parser. Its much better than StreamXMLRecordReader
----- Original Message ---- From: Peter Thygesen <[EMAIL PROTECTED]> To: hadoop-user@lucene.apache.org Sent: Monday, November 26, 2007 8:49:52 AM Subject: MapReduce Job on XML input I would like to run some mapReduce jobs on some xml files I got (aprox. 100000 compressed files). The XML files are not that big about 1 Mb compressed, each containing about 1000 records. Do I have to write my own InputSplitter? Should I use MultiFileInputFormat or StreamInputFormat? Can I use the StreamXmlRecordReader, and how? By sub-classing some input class? The tutorials and examples I've read are all very straight forward reading simple text files, but I miss a more complex example, especially one that reads xml files ;) thx. Peter Looking for the perfect gift? Give the gift of Flickr! http://www.flickr.com/gift/