Lincoln Ritter wrote:
I can see from the  private hiddenFileFilter (used by listPaths) that
'.' and '_' prefixed stuff is considered hidden, I just want to make
sure that this is "standard".

Yes, it is standard for mapreduce input and output directories.

I'm working on getting Nutch 0.9 working with Hadoop 0.17 and hidden
files ("_logs") have been causing some issues.  Granted, you can
configure around this, but I've been looking for other solutions as
well.

If the hidden file behavior is well defined, it would be nice to
provide documentation, and a public interface for determining file
visibility.  Seems to me that splitting off 'hiddenFileFilter' into
its own class or providing an accessor would be sufficient.

If Nutch cannot extend FileInputFormat then, yes, we should make this filter public. If that's the case, please submit a patch.

Thanks,

Doug

Reply via email to