You should state your requirements clearly:
1. What data you want to cluster? (whole index/ search results)
2. What is the role of the extension? How is it going to be used?
(front-end clusters, query refinement, etc)
3. Do you need the implementation or an API for clustering in the
source code? (I'd personally stick to the API; there are many products
out there that perform clustering. Carrot2 is no exception -- there is
an excellent (in my humble opinion :) open source clustering algorithm
Lingo, but there is also a commercial component that is much faster and
more customizable. You can start off with an open source clusterer then
and switch to a commercial product if you want higher scalability or
different functionality. I implemented such an API in Nutch -- take a
look in its source code for hints).
Dawid
Lorenzo wrote:
I see some noise about clustering and lucene, but I'm still waiting for
someone that will help me creating a clustering extension.
I know both carrot2 and weka (the first can be integrated with Lucene, the
latter may be - Falko can you tell me?) but would like to write something
that could be included in the sandbox (or similar) with an implementation
that we'll find the better for a general purpose environment. Maybe carrot2
or other will be the best one (I really hope, I'm a lazy coder;-) ) and so
we will simply ask David to extend his code, but first want to make some
tests.
bye
Lorenzo
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]