Clustering with Lucene?

vivek sar Tue, 26 Apr 2011 04:49:45 -0700

Hi,

  I've been researching about clustering with Lucene. Here is what
I've found so far,


1) Lucene clustering with Carrot2 -
http://download.carrot2.org/head/manual/#section.getting-started.lucene
   - but, this seems suitable for only smaller size index (few hundred
documents) -  
http://download.carrot2.org/head/manual/#section.advanced-topics.fine-tuning.choosing-algorithm

2) Lucene clustering with Mahout -
http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3
   - I'm not very sure if this is ready for prime time yet, there
seems to be very few examples on how to do this. Has anyone tried this
with large index size (millions of documents)?

3) Some clustering library in Lucene's contribution folder -
https://issues.apache.org/jira/browse/LUCENE-1421
   - this doesn't seem to be officially supported and last update to
it was an year back. Has anyone tried this?

4) Lucene clustering with Terracotta -
http://orionl.blogspot.com/2006/11/clustering-lucene.html
  - I'm not sure how to do this, there doesn't seem to be much
activity around this.

We got large indexes - over 500 million records, we usually partition
our index after 20 million records. There are total of 20 fields in
our index, of which we are trying to cluster 5 fields. Is there any
clustering solution for Lucene that would work for us? Carrot2 looked
the most active and promising, but it's clearly recommended for a
small index size. Any other suggestions?

Thanks,
-vivek

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Clustering with Lucene?

Reply via email to