Per, On Fri, Sep 2, 2011 at 12:33 AM, Per Steffensen <st...@designware.dk> wrote: > Yes I found CombineFileInputFormat. It worries me a little though to see > that it extends the deprecated FileInputFormat instead of the new > FileInputFormat. It that a problem? > Also I notice that CombineFileInputFormat is abstract. Why is that? Is the > extension shown on the following webpage a good way out of this: > http://blog.yetitrails.com/2011/04/dealing-with-lots-of-small-files-in.html
It is abstract cause it does not include a record reader with it, and needs you to specify that for your files. Even FileInputFormat is unusable on its own - you generally use Text or Sequence IFs depending on your file format. Its not difficult to extend and write your requirements, though :) That blog post looks good to me as an example. Do adapt it to the proper record reader you require (LineRecordReader, SequenceFile.Reader, etc.). Regarding stable/new API: For 0.20 releases, please disregard the deprecation of mapreduce API. It was undeprecated later and was re-deemed stable. If you'd still like to use the new API for this class, perhaps you need to pull it from a higher version's sources, or use a distro/release that incorporates it (Ex: I use CDH3 here, and it does have CFIP in new and stable API classes both thanks to its tested backporting) -- Harsh J