Hi, Ian.
One reason is that a MapFile is represented by a directory containing
two files named "index" and "data". SequenceFileInputFormat handles
MapFiles too by, if an input file is a directory containing a data file,
using that file.
Another reason is that's what reduces generate.
Neither reason implies that this is the best or only way of doing
things. It would probably be better if FileInputFormat optionally
supported recursive file enumeration. (It would be incompatible and
thus cannot be the default mode.)
Please file an issue in Jira for this and attach your patch.
Thanks,
Doug
Ian Soboroff wrote:
Is there a reason FileInputFormat only traverses the first level of
directories in its InputPaths? (i.e., given an InputPath of 'foo', it
will get foo/* but not foo/bar/*).
I wrote a full depth-first traversal in my custom InputFormat which I
can offer as a patch. But to do it I had to duplicate the PathFilter
classes in FileInputFormat which are marked private, so a mainline patch
would also touch FileInputFormat.
Ian