I will add my patch with in 3 to 4 days. I am done with everything. except that I need to write some test classes.

Thanks
Pallavi

Robin Anil (JIRA) wrote:
[ https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830056#action_12830056 ]
Robin Anil commented on MAHOUT-153:
-----------------------------------

Any progress on this? Will it be ready soon or should it be pushed to 0.4 
release ?

Implement kmeans++ for initial cluster selection in kmeans
----------------------------------------------------------

                Key: MAHOUT-153
                URL: https://issues.apache.org/jira/browse/MAHOUT-153
            Project: Mahout
         Issue Type: New Feature
         Components: Clustering
   Affects Versions: 0.2
        Environment: OS Independent
           Reporter: Panagiotis Papadimitriou
            Fix For: 0.3

  Original Estimate: 336h
 Remaining Estimate: 336h

The current implementation of k-means includes the following algorithms for 
initial cluster selection (seed selection): 1) random selection of k points, 2) 
use of canopy clusters.
I plan to implement k-means++. The details of the algorithm are available here: 
http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf.
Design Outline: I will create an abstract class SeedGenerator and a subclass 
KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will become 
a subclass of SeedGenerator.

Reply via email to