Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by ThomasRichter: http://wiki.apache.org/nutch/PluginCentral The comment on the change is: moved some plugins between 0.7 and 0.8 to correctly show what is in each branch ------------------------------------------------------------------------------ In order to get Nutch to use any of these plugins, you just need to edit your conf/nutch-site.xml file and add the name of the plugin to the list of plugin.includes. + * '''clustering-carrot2''' + * '''creativecommons''' * '''index-basic''' - Adds url, content and anchor fields to the index. * '''index-more''' - Adds date, content-length, contentType, primaryType and subtype fields to the index. * '''languageidentifier''' - Adds a lang field to the index and allows you to query against it. - * '''microformats-reltag''' - Adds [http://www.microformats.org/wiki/Rel-Tag rel-tag] fields to the index and runs queries against them. * '''[wiki:OntologyPlugin ontology]''' - Helps refine queries based on owl files. * '''parse-ext''' - A wrapper that invokes external command to do real parsing job. * '''parse-html''' - Parses HTML documents * '''parse-js''' - Parses Java``Script * '''parse-mp3''' - Parses MP3s - * '''parse-msexcel''' - Parses MS Excel documents - * '''parse-mspowerpoint''' - Parses MS Powerpoint documents * '''parse-msword''' - Parses MS Word documents * '''parse-pdf''' - Parses PDFs * '''parse-rss''' - Parses RSS feeds * '''parse-rtf''' - Parses RTF files - * '''parse-swf''' - Parses Flash SWF files * '''parse-text''' - Parses text documents * '''protocol-file''' - Retreives documents from the filesystem * '''protocol-ftp''' - Retreives documents through ftp @@ -45, +43 @@ * '''analysis-de''' * '''analysis-fr''' - * '''clustering-carrot2''' - * '''creativecommons''' * '''lib-commons-httpclient''' * '''lib-http''' * '''lib-jakarta-poi''' @@ -54, +50 @@ * '''lib-lucene-analyzers''' * '''lib-nekohtml''' * '''lib-parsems''' + * '''parse-msexcel''' - Parses MS Excel documents + * '''parse-mspowerpoint''' - Parses MS Powerpoint documents + * '''parse-swf''' - Parses Flash SWF files + * '''microformats-reltag''' - Adds [http://www.microformats.org/wiki/Rel-Tag rel-tag] fields to the index and runs queries against them. * '''parse-zip''' == Plugins You can Download ==
