[Nutch Wiki] Trivial Update of "PluginCentral" by LewisJohnMcgibbney

Apache Wiki Wed, 13 Jul 2011 02:24:34 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "PluginCentral" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/PluginCentral?action=diff&rev1=69&rev2=70

- Plugins provide a large part of the functionality of nutch. This page acts as 
an up-to-date resource for supported plugins for Nutch 1.3. '''N.B.''' There is 
a wealth of information regarding pre-Nutch 1.3 plugin development available 
[[|here]]
+ Plugins provide a large part of the functionality of nutch. This page acts as 
an up-to-date resource for supported plugins for Nutch 1.3. '''N.B.''' There is 
a wealth of information regarding pre-Nutch 1.3 plugin development available 
[[OldPluginCentral|here]]
  
   * [[WhyNutchHasAPluginSystem]]
   * AboutPlugins - General information on what plugins are and how they work.
@@ -15, +15 @@

   * 
[[http://www.ryanpfister.com/2009/04/how-to-sort-by-date-with-nutch/|Writing a 
plugin to add dates]] by Ryan Pfister
   * HowToMakeCustomSearch - A custom plugin enabling us to search for the 
author of a website in our index by his email id. (N.B. This plugin is for 
Nutch release 1.0)
  
- == Plugins that Come with Nutch (0.9) ==
- 
- In order to get Nutch to use any of these plugins, you just need to edit your 
conf/nutch-site.xml file and add the name of the plugin to the list of 
plugin.includes.
- 
-  * '''[[ClusteringPlugin|clustering-carrot2]]''' - Online Search Results 
Clustering using Carrot2's components.
-  * '''creativecommons''' - Support for crawling and searching 
Creative-Commons licensed content.
-  * '''index-basic''' - Adds url, content and anchor fields to the index.
-  * '''index-more''' - Adds date, content-length, contentType, primaryType and 
subtype fields to the index.
-  * '''languageidentifier''' - Adds a lang field to the index and allows you 
to query against it.
-  * '''[[OntologyPlugin|ontology]]''' - Helps refine queries based on owl 
files.
-  * '''parse-ext''' - A wrapper that invokes external command to do real 
parsing job.
-  * '''parse-html''' - Parses HTML documents
-  * '''parse-js''' - Parses Java``Script
-  * '''parse-mp3''' - Parses MP3s
-  * '''parse-zip''' - Parses ZIP archives
-  * '''parse-mspowerpoint''' - Parses Microsoft Powerpoint files
-  * '''parse-msword''' - Parses MS Word documents
-  * '''parse-msexcel''' - Parses MS Excel documents
-  * '''parse-pdf''' - Parses PDFs
-  * '''parse-rss''' - Parses RSS feeds
-  * '''parse-oo''' - Parses OpenOffice files
-  * '''parse-swf''' - Parses Shockwave Flash
-  * '''parse-rtf''' - Parses RTF files
-  * '''parse-text''' - Parses text documents
-  * '''protocol-file''' - Retreives documents from the filesystem
-  * '''protocol-ftp''' - Retreives documents through ftp
-  * '''protocol-http''' - Retreives documents through http
-  * '''protocol-httpclient''' - Retreives documents through http and https
-  * '''query-basic''' - Runs queries against content, url and anchor fields
-  * '''query-more''' - Runs queries against date, content-length, contentType, 
primaryType and subType fields.
-  * '''query-site''' - Runs queries against site field
-  * '''query-url''' - Runs queries against url field.
-  * '''urlfilter-prefix'''
-  * '''urlfilter-regex'''
- 
- == Additional Plugins in Dev Branch (0.8) ==
- 
-  * '''analysis-de'''
-  * '''analysis-fr'''
-  * '''lib-commons-httpclient'''
-  * '''lib-http'''
-  * '''lib-jakarta-poi'''
-  * '''lib-log4j''' 
-  * '''lib-lucene-analyzers''' - Lucene analyzers
-  * '''lib-nekohtml''' - automatic tag balancer 
-  * '''lib-parsems''' - parse ms documents framework
-  * '''parse-msexcel''' - Parses MS Excel documents
-  * '''parse-mspowerpoint''' - Parses MS Powerpoint documents
-  * '''parse-oo''' - Parses Open Office and Star Office documents 
(Extentsions: ODT, OTT, ODH, ODM, ODS, OTS, ODP, OTP, SXW, STW, SXC, STC, SXI, 
STI)
-  * '''parse-swf''' - Parses Flash SWF files
-  * '''microformats-reltag''' - Adds 
[[http://www.microformats.org/wiki/Rel-Tag|rel-tag]] fields to the index and 
runs queries against them.
-  * '''parse-zip'''
- 
  == Plugins You can Download ==
  
   * LanguageIdentifierPlugin

[Nutch Wiki] Trivial Update of "PluginCentral" by LewisJohnMcgibbney

Reply via email to