Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by MattKangas:
http://wiki.apache.org/nutch/DissectingTheNutchCrawler

------------------------------------------------------------------------------
  
  The main ways to configure the Nutch crawler are as follows:
  
-  1. Configuration files. Default values are in nutch-default.xml, and you 
should override them in nutch-site.xml. [[BR]][[BR]]
+  1. Configuration files. Default values are in nutch-default.xml, and you 
should override them in nutch-site.xml. 
   1. URLFilter interface. By default, the class 
{{{net.nutch.net.RegexURLFilter}}} is used, which reads regular expression 
patterns from regex-urlfilter.txt. So, you can: 
     *  Edit that file to tune its behavior
-    *  Or, write a new class that implements {{{net.nutch.net.URLFilter}}}, 
and change nutch-site.xml to use it. [[BR]][[BR]]
+    *  Or, write a new class that implements {{{net.nutch.net.URLFilter}}}, 
and change nutch-site.xml to use it. 
-  1. Protocol interface. To add support for a new protocol, write or add a 
plugin to the "plugins" directory. To change protocol behavior, modify the 
approprite plugin. [[BR]][[BR]]
+  1. Protocol interface. To add support for a new protocol, write or add a 
plugin to the "plugins" directory. To change protocol behavior, modify the 
approprite plugin. 
-  1. Parser interface. As for Protocol, you should add/create a plugin for any 
new content-types. Otherwise, you will need to replace the appropriate plugin 
if you want to modify its behavior. [[BR]][[BR]]
+  1. Parser interface. As for Protocol, you should add/create a plugin for any 
new content-types. Otherwise, you will need to replace the appropriate plugin 
if you want to modify its behavior. 
   1. If you need to make other changes, refer to our discussion of 
'''Fetcher''' and '''FetchListTool'''. Consider subclassing these classes, 
overriding the appropriate method, then calling your class from the "nutch" 
script using the full class path.
  
  


-------------------------------------------------------
This SF.Net email is sponsored by: New Crystal Reports XI.
Version 11 adds new functionality designed to reduce time involved in
creating, integrating, and deploying reporting solutions. Free runtime info,
new features, or free trial, at: http://www.businessobjects.com/devxi/728
_______________________________________________
Nutch-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to