Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by DawidWeiss: http://wiki.apache.org/nutch/ClusteringPlugin The comment on the change is: Updated the info about clustering plugin and instructions. ------------------------------------------------------------------------------ - -- Main.DawidWeiss - 01 Dec 2004 + = Clustering Plugin = - * plugin name: Online Search Results Clustering using Carrot2's Lingo component + plugin name:: Online Search Results Clustering using Carrot2 components - * plugin version: 0.9.0 + plugin version:: 1.0.3 + == Plugin Info == - * provider: Dawid Weiss, The Carrot2 project - * plugin home url: Included in Nutch CVS. Home WWW of the project: http://carrot2.sourceforge.net - * plugin download url: A binary is included in Nutch CVS. The plugin builds together with Nutch. - * license: BSD-style - * short description: Search results clustering plugin. + * provider: The Carrot2 project, [http://www.carrot2.org] + * plugin home url: Plugin is included in Nutch codebase. + * plugin download url: Binaries included with Nutch. + * license: BSD-style + * short description: Plugin for clustering search results at query-time. - * long description: A plugin that clusters search results into groups of (related, hopefully) documents. + * long description: This plugin organizes search results into groups of (related, hopefully) documents. - * configureable parameters: Take a look at the defaults defined in nutch-default.xml (search for 'clustering'). + * configureable parameters: Take a look at the defaults defined in nutch-default.xml (search for 'clustering'). - * meta data added to index: None. Clustering is performed dynamically for each result set. + * meta data added to index: None. Clustering is performed dynamically for each result set. + * required jars: The entire `lib` folder in the plugin must be present in classpath. More JARs might be needed from the Carrot2 project if additional algorithms or languages are to be used. - * required jars: Many - the entire lib folder in the plugin must be present in classpath. - * plugin extension points: - - * plugin extension point interface: net.nutch.clustering.OnlineClusterer + * plugin extension point interface: net.nutch.clustering.OnlineClusterer - * plugin extension point xml snippet: ? - = Installation guide + == Installation guide == - * Create some index using the instructions provided in Nutch documentation, + * Create a search index using the instructions provided in Nutch documentation. - * Deploy Nutch Web application and make sure the index is found and works (type a query and see if you + * Deploy Nutch Web application and make sure the index is found and searching works (type a query and see if you get any results). - get any results). + * Stop the web server (Tomcat, Jetty or anything you like). + * Modify `WEB-INF/classes/nutch-default.xml` file and include the clustering plugin (it is by default ignored) by adding `clustering-carrot2` to `plugin.includes` property. + * Restart your web server and reload the search page. You should see the `clustering` checkbox next to `search` button. Enable it and rerun your query. Cluster labels and documents should appear to the right of search results. - * Stop Web container (Tomcat) - * You must modify =WEB-INF/classes/nutch-default.xml= file and include the clustering plugin (it is by default - ignored). - - plugin.includes - - protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)|clustering-carrot2 - Regular expression naming plugin directory names to - - include. Any plugin not matching this expression is excluded. By - default Nutch includes crawling just HTML and plain text via HTTP, - and basic indexing and search plugins. - - * Restart Tomcat. - - * Reload the search page of Nutch. You should see the =clustering= checkbox next to =search= button. - Enable it and rerun your query. Clustered results should appear to the right. -