Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by DawidWeiss:
http://wiki.apache.org/nutch/ClusteringPlugin

The comment on the change is:
Updated the info about clustering plugin and instructions.

------------------------------------------------------------------------------
- -- Main.DawidWeiss - 01 Dec 2004
+ = Clustering Plugin =
  
- * plugin name: Online Search Results Clustering using Carrot2's Lingo 
component
+  plugin name:: Online Search Results Clustering using Carrot2 components
- * plugin version: 0.9.0
+  plugin version:: 1.0.3
  
+ == Plugin Info ==
- * provider: Dawid Weiss, The Carrot2 project
- * plugin home url: Included in Nutch CVS. Home WWW of the project: 
http://carrot2.sourceforge.net
- * plugin download url: A binary is included in Nutch CVS. The plugin builds 
together with Nutch.
- * license: BSD-style
  
- * short description: Search results clustering plugin.
+  * provider: The Carrot2 project, [http://www.carrot2.org]
+  * plugin home url: Plugin is included in Nutch codebase.
+  * plugin download url: Binaries included with Nutch.
+  * license: BSD-style
+  * short description: Plugin for clustering search results at query-time.
- * long description: A plugin that clusters search results into groups of 
(related, hopefully) documents.
+  * long description: This plugin organizes search results into groups of 
(related, hopefully) documents.
- * configureable parameters: Take a look at the defaults defined in 
nutch-default.xml (search for 'clustering').
+  * configureable parameters: Take a look at the defaults defined in 
nutch-default.xml (search for 'clustering').
- * meta data added to index: None. Clustering is performed dynamically for 
each result set.
+  * meta data added to index: None. Clustering is performed dynamically for 
each result set.
+  * required jars: The entire `lib` folder in the plugin must be present in 
classpath. More JARs might be needed from the Carrot2 project if additional  
algorithms or languages are to be used.
- * required jars: Many - the entire lib folder in the plugin must be present 
in classpath.
- * plugin extension points:
- 
- * plugin extension point interface: net.nutch.clustering.OnlineClusterer
+  * plugin extension point interface: net.nutch.clustering.OnlineClusterer
- * plugin extension point xml snippet: ?
  
  
- = Installation guide
+ == Installation guide ==
  
- * Create some index using the instructions provided in Nutch documentation,
+  * Create a search index using the instructions provided in Nutch 
documentation.
- * Deploy Nutch Web application and make sure the index is found and works 
(type a query and see if you
+  * Deploy Nutch Web application and make sure the index is found and 
searching works (type a query and see if you get any results).
- get any results).
+  * Stop the web server (Tomcat, Jetty or anything you like).
+  * Modify `WEB-INF/classes/nutch-default.xml` file and include the clustering 
plugin (it is by default ignored) by adding `clustering-carrot2` to 
`plugin.includes` property.
+  * Restart your web server and reload the search page. You should see the 
`clustering` checkbox next to `search` button. Enable it and rerun your query. 
Cluster labels and documents should appear to the right of search results.
  
- * Stop Web container (Tomcat)
- * You must modify =WEB-INF/classes/nutch-default.xml= file and include the 
clustering plugin (it is by default
- ignored).
- 
- plugin.includes
- 
- 
protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)|clustering-carrot2
- Regular expression naming plugin directory names to
- 
- include.  Any plugin not matching this expression is excluded.  By
- default Nutch includes crawling just HTML and plain text via HTTP,
- and basic indexing and search plugins.
- 
- * Restart Tomcat.
- 
- * Reload the search page of Nutch. You should see the =clustering= checkbox 
next to =search= button.
- Enable it and rerun your query. Clustered results should appear to the right.
- 

Reply via email to