[Nutch Wiki] Update of "WritingPlugins" by JeromeCharron

Apache Wiki Mon, 14 Nov 2005 14:31:42 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JeromeCharron:
http://wiki.apache.org/nutch/WritingPlugins

The comment on the change is:
Complete the list of nutch-core extension points.

------------------------------------------------------------------------------
== Introduction ==

- Writing a plugin allows you to extend and change Nutch without having to
modify the core system. In writing a plugin, you're actually writing an
implementation of one of the following Nutch interfaces (Please update this
list with any I've missed):
+ Writing a plugin allows you to extend and change Nutch without having to
modify the core system. In writing a plugin, you're actually providing one or
more ''extension'' of the existing ''extension-points'' . The core Nutch
''extension-points'' are themselves defined in a plugin, the
NutchExtensionPoints plugin (they are listed in the NutchExtensionPoints
[http://svn.apache.org/viewcvs.cgi/lucene/nutch/trunk/src/plugin/nutch-extensionpoints/plugin.xml?view=markup
plugin.xml] file). Each ''extension-point'' define an interface that must be
implemented by the ''extension''. Nutch core extension points are (Please
update this list with any I've missed):

+ *
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/clustering/OnlineClusterer.html
OnlineClusterer] -- An extension point interface for online search results
clustering algorithms (from javadoc).
+ *
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/indexer/IndexingFilter.html
IndexingFilter] -- Permits one to add metadata to the indexed fields. All
plugins found which implement this extension point are run sequentially on the
parse (from javadoc).
+ *
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/ontology/Ontology.html
Ontology]
* [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/parse/Parser.html
Parser] -- Parser implementations read through fetched documents in order to
extract data to be indexed. This is what you need to implement if you want
Nutch to be able to parse a new type of content, or extract more data from
currently parseable content.
+ *
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/parse/HtmlParseFilter.html
HtmlParseFilter] -- Permits one to add additional metadata to HTML parses
(from javadoc).
*
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/protocol/Protocol.html
Protocol] -- Protocol implementations allow nutch to use different protocols
(ftp, http, etc.) to fetch documents.
+ *
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/searcher/QueryFilter.html
QueryFilter] -- Extension point for query translation. Permits one to add
metadata to a query (from javadoc).
*
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/net/URLFilter.html
URLFilter] -- URLFilter implementations limit the URLs that nutch attempts to
fetch. The
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/net/RegexURLFilter.html
RegexURLFilter] distributed with Nutch provides a great deal of control over
what URLs Nutch crawls, however if you have very complicated rules about what
URLs you want to crawl, you can write your own implementation.
+ *
[http://svn.apache.org/viewcvs.cgi/lucene/nutch/trunk/src/java/org/apache/nutch/analysis/NutchAnalyzer.java?view=markup
NutchAnalyzer] -- An extension point that enables to provide some language
specific analyzers (see MultiLingualSupport proposal). ''Since it is in
development stage, it is not in released javadoc''.

== Setup ==

[Nutch Wiki] Update of "WritingPlugins" by JeromeCharron

Reply via email to