Hi All, I am a developer want to write a plugin using JSOUP in nutch for parsing the html file. But to get better feel of it i would need to understand the whole functionality.
What i perceived is URLFilter, URLFilterChecker and URLFilters.java but i get confused when i see the following files RegexURLFilter, PrefixURLFilter. Please can anybody tell me exactly which java files are handling the URL filtering and politeness of the crawler. Awaiting for positive reply. Thanks in advance. From: Naveen Shukla

