nutch-user  

Re: Nutch spider trap detection

Dennis Kubes
Sun, 29 Jun 2008 15:22:42 -0700

There are some regexes in the url normalizers and there is some code in DomContentUtils for recursion.

Dennis

brainstorm wrote:
Hi!

I guess it is implemented, but cannot find it by myself on nutch API
docs nor wiki :-/ ... Is there any mechanism implemented in nutch to
detect spider traps[1] ?

Thanks,
Roman

[1] http://en.wikipedia.org/wiki/Spider_trap