Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JakeVanderdray: http://wiki.apache.org/nutch/WritingPlugins ------------------------------------------------------------------------------ * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/protocol/Protocol.html Protocol] -- Protocol implementations allow nutch to use different protocols (ftp, http, etc.) to fetch documents. * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/net/URLFilter.html URLFilter] -- URLFilter implementations limit the URLs that nutch attempts to fetch. The [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/net/RegexURLFilter.html RegexURLFilter] distributed with Nutch provides a great deal of control over what URLs Nutch crawls, however if you have very complicated rules about what URLs you want to crawl, you can write your own implementation. + == Setup == + + You need to start by [http://www.apache.org/dev/version-control.html#anon-svn downloading] the Nutch source code. Once you've got that make sure it compiles as is before you make any changes. + <<< PluginCentral
