Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JeromeCharron: http://wiki.apache.org/nutch/WritingPlugins The comment on the change is: Complete the list of nutch-core extension points. ------------------------------------------------------------------------------ == Introduction == - Writing a plugin allows you to extend and change Nutch without having to modify the core system. In writing a plugin, you're actually writing an implementation of one of the following Nutch interfaces (Please update this list with any I've missed): + Writing a plugin allows you to extend and change Nutch without having to modify the core system. In writing a plugin, you're actually providing one or more ''extension'' of the existing ''extension-points'' . The core Nutch ''extension-points'' are themselves defined in a plugin, the NutchExtensionPoints plugin (they are listed in the NutchExtensionPoints [http://svn.apache.org/viewcvs.cgi/lucene/nutch/trunk/src/plugin/nutch-extensionpoints/plugin.xml?view=markup plugin.xml] file). Each ''extension-point'' define an interface that must be implemented by the ''extension''. Nutch core extension points are (Please update this list with any I've missed): + * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/clustering/OnlineClusterer.html OnlineClusterer] -- An extension point interface for online search results clustering algorithms (from javadoc). + * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/indexer/IndexingFilter.html IndexingFilter] -- Permits one to add metadata to the indexed fields. All plugins found which implement this extension point are run sequentially on the parse (from javadoc). + * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/ontology/Ontology.html Ontology] * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/parse/Parser.html Parser] -- Parser implementations read through fetched documents in order to extract data to be indexed. This is what you need to implement if you want Nutch to be able to parse a new type of content, or extract more data from currently parseable content. + * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/parse/HtmlParseFilter.html HtmlParseFilter] -- Permits one to add additional metadata to HTML parses (from javadoc). * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/protocol/Protocol.html Protocol] -- Protocol implementations allow nutch to use different protocols (ftp, http, etc.) to fetch documents. + * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/searcher/QueryFilter.html QueryFilter] -- Extension point for query translation. Permits one to add metadata to a query (from javadoc). * [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/net/URLFilter.html URLFilter] -- URLFilter implementations limit the URLs that nutch attempts to fetch. The [http://lucene.apache.org/nutch/apidocs/org/apache/nutch/net/RegexURLFilter.html RegexURLFilter] distributed with Nutch provides a great deal of control over what URLs Nutch crawls, however if you have very complicated rules about what URLs you want to crawl, you can write your own implementation. + * [http://svn.apache.org/viewcvs.cgi/lucene/nutch/trunk/src/java/org/apache/nutch/analysis/NutchAnalyzer.java?view=markup NutchAnalyzer] -- An extension point that enables to provide some language specific analyzers (see MultiLingualSupport proposal). ''Since it is in development stage, it is not in released javadoc''. == Setup ==
