I will add my patch with in 3 to 4 days. I am done with everything.
except that I need to write some test classes.
Thanks
Pallavi
Robin Anil (JIRA) wrote:
[ https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830056#action_12830056 ]
Robin Anil commented on MAHOUT-153:
-----------------------------------
Any progress on this? Will it be ready soon or should it be pushed to 0.4
release ?
Implement kmeans++ for initial cluster selection in kmeans
----------------------------------------------------------
Key: MAHOUT-153
URL: https://issues.apache.org/jira/browse/MAHOUT-153
Project: Mahout
Issue Type: New Feature
Components: Clustering
Affects Versions: 0.2
Environment: OS Independent
Reporter: Panagiotis Papadimitriou
Fix For: 0.3
Original Estimate: 336h
Remaining Estimate: 336h
The current implementation of k-means includes the following algorithms for
initial cluster selection (seed selection): 1) random selection of k points, 2)
use of canopy clusters.
I plan to implement k-means++. The details of the algorithm are available here:
http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf.
Design Outline: I will create an abstract class SeedGenerator and a subclass
KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will become
a subclass of SeedGenerator.