I would like to run some mapReduce jobs on some xml files I got (aprox.
100000 compressed files). 
The XML files are not that big about 1 Mb compressed, each containing
about 1000 records. 

Do I have to write my own InputSplitter? Should I use
MultiFileInputFormat or StreamInputFormat? Can I use the
StreamXmlRecordReader, and how? By sub-classing some input class?

The tutorials and examples I've read are all very straight forward
reading simple text files, but I miss a more complex example, especially
one that reads xml files ;) 

thx. 
Peter


Reply via email to