I would like to run some mapReduce jobs on some xml files I got (aprox. 100000 compressed files). The XML files are not that big about 1 Mb compressed, each containing about 1000 records.
Do I have to write my own InputSplitter? Should I use MultiFileInputFormat or StreamInputFormat? Can I use the StreamXmlRecordReader, and how? By sub-classing some input class? The tutorials and examples I've read are all very straight forward reading simple text files, but I miss a more complex example, especially one that reads xml files ;) thx. Peter
