Hi All,

I am a developer want to write a plugin using JSOUP in nutch for parsing
the html file. But to get better feel of it i would need to understand the
whole functionality.

What i perceived is URLFilter, URLFilterChecker and URLFilters.java but i
get confused when i see the following files RegexURLFilter, PrefixURLFilter.

Please can anybody tell me exactly which java files are handling the URL
filtering and politeness of the crawler.

Awaiting for positive reply.

Thanks in advance.

From:

Naveen Shukla

Reply via email to