Gaurang,
You can fetch documents from Nutch indexes (which are Lucene indexes) and then
feed them to the clustering algorithm directly, as explained in Carrot2 examples
here:
http://download.carrot2.org/head/manual/index.html#section.integration
There are several examples you can choose to start from -- some of them accept
raw data, some of them use Lucene document source.
http://fisheye3.atlassian.com/browse/carrot2/branches/stable/applications/carrot2-examples/src/org/carrot2/examples/clustering
If you need ultimate flexibility, go with the raw-data example:
http://fisheye3.atlassian.com/browse/carrot2/branches/stable/applications/carrot2-examples/src/org/carrot2/examples/clustering/ClusteringDocumentList.java?r=3345
Dawid
Gaurang Patel wrote:
Hi all,
Can anyone know how can I use the nutch crawled results for clustering them
with Carrot2 clustering engine? What I want is different from Carrot2
clustering plugin that comes with nutch. I want to write my own code for
retrieving document list from nutch crawled results, and then want to supply
this list to the Carrot2 algorithm.
Any kind of quick help will be appriciated.
Regards,
Gaurang