Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "DebugTool" page has been changed by ChrisMattmann:
http://wiki.apache.org/nutch/DebugTool

New page:
Based on some conversations on list:

We've gathered some requirements for a Debug Tool, that could be useful in 
allowing users to know precisely what decisions that Nutch is making while it 
navigates the URL space. So far, here's what we have from Ken Krugler, 
primarily, and those others (Markus Jelsma, Chris Mattmann, Lewis John 
McGibbney) participating in the above referenced thread:

It should be possible to generate information that would have answered all of 
the "is it X" questions that came up during a user's crawl. E.g.

- which URLs were put on the fetch list, versus skipped.
- which fetched documents were truncated.
- which URLs in a parsed page were skipped, due to the max outlinks per page 
limit.
- which URLs got filtered by regex

Please add more requirements and discussion here.

Reply via email to