Per,

On Fri, Sep 2, 2011 at 12:33 AM, Per Steffensen <st...@designware.dk> wrote:
> Yes I found CombineFileInputFormat. It worries me a little though to see
> that it extends the deprecated FileInputFormat instead of the new
> FileInputFormat. It that a problem?
> Also I notice that CombineFileInputFormat is abstract. Why is that? Is the
> extension shown on the following webpage a good way out of this:
> http://blog.yetitrails.com/2011/04/dealing-with-lots-of-small-files-in.html

It is abstract cause it does not include a record reader with it, and
needs you to specify that for your files. Even FileInputFormat is
unusable on its own - you generally use Text or Sequence IFs depending
on your file format. Its not difficult to extend and write your
requirements, though :)

That blog post looks good to me as an example. Do adapt it to the
proper record reader you require (LineRecordReader,
SequenceFile.Reader, etc.).

Regarding stable/new API: For 0.20 releases, please disregard the
deprecation of mapreduce API. It was undeprecated later and was
re-deemed stable. If you'd still like to use the new API for this
class, perhaps you need to pull it from a higher version's sources, or
use a distro/release that incorporates it (Ex: I use CDH3 here, and it
does have CFIP in new and stable API classes both thanks to its tested
backporting)

-- 
Harsh J

Reply via email to