That isn't all that many files.  At 1MB, you shouldn't be seeing much
performance hit due to reading many files.

You will need a special input format but it can be very simple.  Just extend
something like TextInputFormat and replace the record reader and report the
file as unsplittable.


On 11/26/07 8:49 AM, "Peter Thygesen" <[EMAIL PROTECTED]> wrote:

> I would like to run some mapReduce jobs on some xml files I got (aprox.
> 100000 compressed files).
> The XML files are not that big about 1 Mb compressed, each containing
> about 1000 records.
> 
> Do I have to write my own InputSplitter? Should I use
> MultiFileInputFormat or StreamInputFormat? Can I use the
> StreamXmlRecordReader, and how? By sub-classing some input class?
> 
> The tutorials and examples I've read are all very straight forward
> reading simple text files, but I miss a more complex example, especially
> one that reads xml files ;)
> 
> thx. 
> Peter
> 
> 

Reply via email to