Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/AboutPlugins

New page:
Nutch's plugin system is based on the one used in Eclipse 2.x.  Plugins are 
central to how nutch works.  All of the parsing, indexing and searching that 
nutch does is actually accomplished by various plugins.

In writing a plugin, you're actually providing one or more ''extensions'' of 
the existing ''extension-points'' . The core Nutch ''extension-points'' are 
themselves defined in a plugin, the 
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/plugin/ExtensionPoint.html
 NutchExtensionPoints] plugin (they are listed in the !NutchExtensionPoints 
[http://svn.apache.org/viewcvs.cgi/lucene/nutch/trunk/src/plugin/nutch-extensionpoints/plugin.xml?view=markup
 plugin.xml] file). Each ''extension-point'' defines an interface that must be 
implemented by the ''extension''. The core extension points are:

 * 
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/clustering/OnlineClusterer.html
 OnlineClusterer] -- An extension point interface for online search results 
clustering algorithms (from javadoc).
 * 
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/indexer/IndexingFilter.html
 IndexingFilter] -- Permits one to add metadata to the indexed fields. All 
plugins found which implement this extension point are run sequentially on the 
parse (from javadoc).
 * 
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/ontology/Ontology.html 
Ontology]
 * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/parse/Parser.html 
Parser] -- Parser implementations read through fetched documents in order to 
extract data to be indexed.  This is what you need to implement if you want 
Nutch to be able to parse a new type of content, or extract more data from 
currently parseable content.
 * 
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/parse/HtmlParseFilter.html
 HtmlParseFilter] -- Permits one to add additional metadata to HTML parses 
(from javadoc).
 * 
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/protocol/Protocol.html 
Protocol] -- Protocol implementations allow nutch to use different protocols 
(ftp, http, etc.) to fetch documents.
 * 
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/searcher/QueryFilter.html
 QueryFilter] -- Extension point for query translation. Permits one to add 
metadata to a query (from javadoc).
 * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/net/URLFilter.html 
URLFilter] -- URLFilter implementations limit the URLs that nutch attempts to 
fetch.  The 
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/net/RegexURLFilter.html
 RegexURLFilter] distributed with Nutch provides a great deal of control over 
what URLs Nutch crawls, however if you have very complicated rules about what 
URLs you want to crawl, you can write your own implementation.
 * 
[http://svn.apache.org/viewcvs.cgi/lucene/nutch/trunk/src/java/org/apache/nutch/analysis/NutchAnalyzer.java?view=markup
 NutchAnalyzer] -- An extension point that provides some language specific 
analyzers (see MultiLingualSupport proposal). ''Since it is in development 
stage, it is not in released javadoc''.

== Source Files ==

You'll find the following inside of a plugin source directory:

 * A plugin.xml file that tells nutch about your plugin.
 * A build.xml file that tells ant how to build your plugin.
 * The source code of your plugin.

== Getting Nutch to Use a Plugin ==

In order to get Nutch to a given plugin, you need to edit your 
conf/nutch-site.xml file and add the name of the plugin to the list of 
plugin.includes.

<<< See also: WritingPluginExample

<<< See also: HowToContribute

<<< PluginCentral


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to