Lincoln Ritter wrote:
I can see from the private hiddenFileFilter (used by listPaths) that '.' and '_' prefixed stuff is considered hidden, I just want to make sure that this is "standard".
Yes, it is standard for mapreduce input and output directories.
I'm working on getting Nutch 0.9 working with Hadoop 0.17 and hidden files ("_logs") have been causing some issues. Granted, you can configure around this, but I've been looking for other solutions as well. If the hidden file behavior is well defined, it would be nice to provide documentation, and a public interface for determining file visibility. Seems to me that splitting off 'hiddenFileFilter' into its own class or providing an accessor would be sufficient.
If Nutch cannot extend FileInputFormat then, yes, we should make this filter public. If that's the case, please submit a patch.
Thanks, Doug