Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JakeVanderdray: http://wiki.apache.org/nutch/WritingPlugins ------------------------------------------------------------------------------ * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/parse/Parser.html Parser] -- Parser implementations read through fetched documents in order to extract data to be indexed. This is what you need to implement if you want Nutch to be able to parse a new type of content, or extract more data from currently parseable content. * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/protocol/Protocol.html Protocol] -- Protocol implementations allow nutch to use different protocols (ftp, http, etc.) to fetch documents. + * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/net/URLFilter.html URLFilter] -- URLFilter implementations limit the URLs that nutch attempts to fetch. The [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/net/RegexURLFilter.html RegexURLFilter] distributed with Nutch provides a great deal of control over what URLs Nutch crawls, however if you have very complicated rules about what URLs you want to crawl, you can write your own implementation. <<< PluginCentral
