[ https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796323#action_12796323 ]
Ted Dunning commented on MAHOUT-153: ------------------------------------ {quote} On Mon, Jan 4, 2010 at 4:03 AM, Palleti, Pallavi <pallavi.pall...@corp.aol.com> wrote: Initially, I used canopy clustering seeds as initial seeds but the results weren't good and the number of clusters depends on the distance thresholds we give as input. Later, I have considered randomly selecting some points from the input dataset and consider them as initial seeds. Again, the results were not good. Now, I have chosen initial seeds from input set in such a way that the points are far from each other and I have observed better clustering using Fuzzy Kmeans. I have not implemented a map-reducable version for this seed selection. I will soon implement a map-reducable version and submit a patch. {quote} I encouraged Pallavi on the mailing list to submit his patches here on this issue. Hopefully he will be able to drive the process forward. > Implement kmeans++ for initial cluster selection in kmeans > ---------------------------------------------------------- > > Key: MAHOUT-153 > URL: https://issues.apache.org/jira/browse/MAHOUT-153 > Project: Mahout > Issue Type: New Feature > Components: Clustering > Affects Versions: 0.2 > Environment: OS Independent > Reporter: Panagiotis Papadimitriou > Fix For: 0.3 > > Original Estimate: 336h > Remaining Estimate: 336h > > The current implementation of k-means includes the following algorithms for > initial cluster selection (seed selection): 1) random selection of k points, > 2) use of canopy clusters. > I plan to implement k-means++. The details of the algorithm are available > here: http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf. > Design Outline: I will create an abstract class SeedGenerator and a subclass > KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will > become a subclass of SeedGenerator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.