Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "DebugTool" page has been changed by ChrisMattmann: http://wiki.apache.org/nutch/DebugTool New page: Based on some conversations on list: We've gathered some requirements for a Debug Tool, that could be useful in allowing users to know precisely what decisions that Nutch is making while it navigates the URL space. So far, here's what we have from Ken Krugler, primarily, and those others (Markus Jelsma, Chris Mattmann, Lewis John McGibbney) participating in the above referenced thread: It should be possible to generate information that would have answered all of the "is it X" questions that came up during a user's crawl. E.g. - which URLs were put on the fetch list, versus skipped. - which fetched documents were truncated. - which URLs in a parsed page were skipped, due to the max outlinks per page limit. - which URLs got filtered by regex Please add more requirements and discussion here.

