Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "DebugTool" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/DebugTool?action=diff&rev1=4&rev2=5 It should be possible to generate information that would have answered all of the "is it X" questions that came up during a user's crawl. E.g. 1. which URLs were put on the fetch list, versus skipped. - 1. which fetched documents were truncated. + 1. which fetched documents were truncated. + - The code currently has primitive logging for all parse plugins to log verification of truncation to stdout. What more could we do here? It is a common problem so would be good to improve this area. 1. which URLs in a parsed page were skipped, due to the max outlinks per page limit. 1. which URLs got filtered by regex, prefix, suffix, domain filters 1. exclusions by robots directives

