Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "DebugTool" page has been changed by SebastianNagel:
http://wiki.apache.org/nutch/DebugTool?action=diff&rev1=3&rev2=4

Comment:
completions

   1. which URLs were put on the fetch list, versus skipped.
   1. which fetched documents were truncated.
   1. which URLs in a parsed page were skipped, due to the max outlinks per 
page limit.
-  1. which URLs got filtered by regex
+  1. which URLs got filtered by regex, prefix, suffix, domain filters
+  1. exclusions by robots directives
+    * robots.txt
+    * outlinks skipped by meta nofollow
+  1. URLs mapped to another URL
+    * URL normalization
+    * redirects
  
  Please add more requirements and discussion here.
  

Reply via email to