Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "AboutPlugins" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/AboutPlugins?action=diff&rev1=8&rev2=9

  Nutch's plugin system is based on the one used in 
[[http://www.eclipse.org/articles/Article-Plug-in-architecture/plugin_architecture.html|Eclipse
 2.x]].  Plugins are central to how nutch works.  All of the parsing, indexing 
and searching that nutch does is actually accomplished by various plugins.
  
- In writing a plugin, you're actually providing one or more ''extensions'' of 
the existing ''extension-points'' . The core Nutch ''extension-points'' are 
themselves defined in a plugin, the 
[[http://nutch.apache.org/apidocs-1.1/org/apache/nutch/plugin/ExtensionPoint.html|NutchExtensionPoints]]
 plugin (they are listed in the !NutchExtensionPoints 
[[http://svn.apache.org/viewcvs.cgi/nutch/trunk/src/plugin/nutch-extensionpoints/plugin.xml?view=markup|plugin.xml]]
 file). Each ''extension-point'' defines an interface that must be implemented 
by the ''extension''. The core extension points are:
+ In writing a plugin, you're actually providing one or more ''extensions'' of 
the existing ''extension-points'' . The core Nutch ''extension-points'' are 
themselves defined in a plugin, the 
[[http://nutch.apache.org/apidocs-1.4/org/apache/nutch/plugin/ExtensionPoint.html|NutchExtensionPoints]]
 plugin (they are listed in the !NutchExtensionPoints 
[[http://svn.apache.org/viewcvs.cgi/nutch/trunk/src/plugin/nutch-extensionpoints/plugin.xml?view=markup|plugin.xml]]
 file). Each ''extension-point'' defines an interface that must be implemented 
by the ''extension''. The core extension points are:
  
-  * 
[[http://nutch.apache.org/apidocs-1.1/org/apache/nutch/clustering/OnlineClusterer.html|OnlineClusterer]]
 -- An extension point interface for online search results clustering 
algorithms (from javadoc).
-  * 
[[http://nutch.apache.org/apidocs-1.1/org/apache/nutch/indexer/IndexingFilter.html|IndexingFilter]]
 -- Permits one to add metadata to the indexed fields. All plugins found which 
implement this extension point are run sequentially on the parse (from javadoc).
+  * 
[[http://nutch.apache.org/apidocs-1.4/org/apache/nutch/indexer/IndexingFilter.html|IndexingFilter]]
 -- Permits one to add metadata to the indexed fields. All plugins found which 
implement this extension point are run sequentially on the parse (from javadoc).
-  * 
[[http://nutch.apache.org/apidocs-1.1/org/apache/nutch/ontology/Ontology.html|Ontology]]
-  * 
[[http://nutch.apache.org/apidocs-1.1/org/apache/nutch/parse/Parser.html|Parser]]
 -- Parser implementations read through fetched documents in order to extract 
data to be indexed.  This is what you need to implement if you want Nutch to be 
able to parse a new type of content, or extract more data from currently 
parseable content.
+  * 
[[http://nutch.apache.org/apidocs-1.4/org/apache/nutch/parse/Parser.html|Parser]]
 -- Parser implementations read through fetched documents in order to extract 
data to be indexed.  This is what you need to implement if you want Nutch to be 
able to parse a new type of content, or extract more data from currently 
parseable content.
-  * 
[[http://nutch.apache.org/apidocs-1.1/org/apache/nutch/parse/HtmlParseFilter.html|HtmlParseFilter]]
 -- Permits one to add additional metadata to HTML parses (from javadoc).
+  * 
[[http://nutch.apache.org/apidocs-1.4/org/apache/nutch/parse/HtmlParseFilter.html|HtmlParseFilter]]
 -- Permits one to add additional metadata to HTML parses (from javadoc).
-  * 
[[http://nutch.apache.org/apidocs-1.1/org/apache/nutch/protocol/Protocol.html|Protocol]]
 -- Protocol implementations allow nutch to use different protocols (ftp, http, 
etc.) to fetch documents.
+  * 
[[http://nutch.apache.org/apidocs-1.4/org/apache/nutch/protocol/Protocol.html|Protocol]]
 -- Protocol implementations allow nutch to use different protocols (ftp, http, 
etc.) to fetch documents.
-  * 
[[http://nutch.apache.org/apidocs-1.1/org/apache/nutch/searcher/QueryFilter.html|QueryFilter]]
 -- Extension point for query translation. Permits one to add metadata to a 
query (from javadoc).
-  * 
[[http://nutch.apache.org/apidocs-1.1/org/apache/nutch/net/URLFilter.html|URLFilter]]
 -- URLFilter implementations limit the URLs that nutch attempts to fetch.  The 
[[http://nutch.apache.org/apidocs-1.1/org/apache/nutch/net/RegexURLFilter.html|RegexURLFilter]]
 distributed with Nutch provides a great deal of control over what URLs Nutch 
crawls, however if you have very complicated rules about what URLs you want to 
crawl, you can write your own implementation.
+  * 
[[http://nutch.apache.org/apidocs-1.4/org/apache/nutch/net/URLFilter.html|URLFilter]]
 -- URLFilter implementations limit the URLs that nutch attempts to fetch.  The 
[[http://nutch.apache.org/apidocs-1.1/org/apache/nutch/net/RegexURLFilter.html|RegexURLFilter]]
 distributed with Nutch provides a great deal of control over what URLs Nutch 
crawls, however if you have very complicated rules about what URLs you want to 
crawl, you can write your own implementation.
+  * 
[[http://nutch.apache.org/apidocs-1.4/org/apache/nutch/net/URLNormalizer.html|URLNormalizer]]
 -- Interface used to convert URLs to normal form and optionally perform 
substitutions.
+  * 
[[http://nutch.apache.org/apidocs-1.4/org/apache/nutch/scoring/ScoringFilter.html|ScoringFilter]]
 -- A contract defining behavior of scoring plugins. A scoring filter will 
manipulate scoring variables in CrawlDatum and in resulting search indexes. 
Filters can be chained in a specific order, to provide multi-stage scoring 
adjustments. 
+  * 
[[http://nutch.apache.org/apidocs-1.4/org/apache/nutch/segment/SegmentMergeFilter.html|SegmentMergeFilter]]
 -- Interface used to filter segments during segment merge. It allows filtering 
on more sophisticated criteria than just URLs. In particular it allows 
filtering based on metadata collected while parsing page. 
   * 
[[http://svn.apache.org/viewcvs.cgi/nutch/trunk/src/java/org/apache/nutch/analysis/NutchAnalyzer.java?view=markup|NutchAnalyzer]]
 -- An extension point that provides some language specific analyzers (see 
MultiLingualSupport proposal). ''Since it is in development stage, it is not in 
released javadoc''.
  
  

Reply via email to