Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "PluginCentral" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/PluginCentral?action=diff&rev1=69&rev2=70 - Plugins provide a large part of the functionality of nutch. This page acts as an up-to-date resource for supported plugins for Nutch 1.3. '''N.B.''' There is a wealth of information regarding pre-Nutch 1.3 plugin development available [[|here]] + Plugins provide a large part of the functionality of nutch. This page acts as an up-to-date resource for supported plugins for Nutch 1.3. '''N.B.''' There is a wealth of information regarding pre-Nutch 1.3 plugin development available [[OldPluginCentral|here]] * [[WhyNutchHasAPluginSystem]] * AboutPlugins - General information on what plugins are and how they work. @@ -15, +15 @@ * [[http://www.ryanpfister.com/2009/04/how-to-sort-by-date-with-nutch/|Writing a plugin to add dates]] by Ryan Pfister * HowToMakeCustomSearch - A custom plugin enabling us to search for the author of a website in our index by his email id. (N.B. This plugin is for Nutch release 1.0) - == Plugins that Come with Nutch (0.9) == - - In order to get Nutch to use any of these plugins, you just need to edit your conf/nutch-site.xml file and add the name of the plugin to the list of plugin.includes. - - * '''[[ClusteringPlugin|clustering-carrot2]]''' - Online Search Results Clustering using Carrot2's components. - * '''creativecommons''' - Support for crawling and searching Creative-Commons licensed content. - * '''index-basic''' - Adds url, content and anchor fields to the index. - * '''index-more''' - Adds date, content-length, contentType, primaryType and subtype fields to the index. - * '''languageidentifier''' - Adds a lang field to the index and allows you to query against it. - * '''[[OntologyPlugin|ontology]]''' - Helps refine queries based on owl files. - * '''parse-ext''' - A wrapper that invokes external command to do real parsing job. - * '''parse-html''' - Parses HTML documents - * '''parse-js''' - Parses Java``Script - * '''parse-mp3''' - Parses MP3s - * '''parse-zip''' - Parses ZIP archives - * '''parse-mspowerpoint''' - Parses Microsoft Powerpoint files - * '''parse-msword''' - Parses MS Word documents - * '''parse-msexcel''' - Parses MS Excel documents - * '''parse-pdf''' - Parses PDFs - * '''parse-rss''' - Parses RSS feeds - * '''parse-oo''' - Parses OpenOffice files - * '''parse-swf''' - Parses Shockwave Flash - * '''parse-rtf''' - Parses RTF files - * '''parse-text''' - Parses text documents - * '''protocol-file''' - Retreives documents from the filesystem - * '''protocol-ftp''' - Retreives documents through ftp - * '''protocol-http''' - Retreives documents through http - * '''protocol-httpclient''' - Retreives documents through http and https - * '''query-basic''' - Runs queries against content, url and anchor fields - * '''query-more''' - Runs queries against date, content-length, contentType, primaryType and subType fields. - * '''query-site''' - Runs queries against site field - * '''query-url''' - Runs queries against url field. - * '''urlfilter-prefix''' - * '''urlfilter-regex''' - - == Additional Plugins in Dev Branch (0.8) == - - * '''analysis-de''' - * '''analysis-fr''' - * '''lib-commons-httpclient''' - * '''lib-http''' - * '''lib-jakarta-poi''' - * '''lib-log4j''' - * '''lib-lucene-analyzers''' - Lucene analyzers - * '''lib-nekohtml''' - automatic tag balancer - * '''lib-parsems''' - parse ms documents framework - * '''parse-msexcel''' - Parses MS Excel documents - * '''parse-mspowerpoint''' - Parses MS Powerpoint documents - * '''parse-oo''' - Parses Open Office and Star Office documents (Extentsions: ODT, OTT, ODH, ODM, ODS, OTS, ODP, OTP, SXW, STW, SXC, STC, SXI, STI) - * '''parse-swf''' - Parses Flash SWF files - * '''microformats-reltag''' - Adds [[http://www.microformats.org/wiki/Rel-Tag|rel-tag]] fields to the index and runs queries against them. - * '''parse-zip''' - == Plugins You can Download == * LanguageIdentifierPlugin

