[ 
https://issues.apache.org/jira/browse/MATH-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044654#comment-13044654
 ] 

Nate Paymer commented on MATH-548:
----------------------------------

The best result would be the one that minimizes the sum of the squared distance 
from each point to the center of its cluster.  See 
http://en.wikipedia.org/wiki/K-means_clustering#Description

> KMeansPlusPlusClusterer should run multiple trials
> --------------------------------------------------
>
>                 Key: MATH-548
>                 URL: https://issues.apache.org/jira/browse/MATH-548
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Nate Paymer
>            Priority: Minor
>
> The interface and documentation for KMeansPlusPlusClusterer imply that a 
> single call to cluster() is sufficient to get the optimal set of clusters.  
> But this isn't true -- practically every client should be calling cluster() 
> multiple times, selecting the best resulting set of clusters.  It seems to me 
> that rather than forcing every client to implement this functionality, it 
> should be placed directly in the KMeansPlusPlusClusterer class.
> I propose adding a new method to KMeansPlusPlusClusterer:
>   List<Cluster<T>> cluster(Collection<T> points, int k, int numTrials, int 
> maxIterationsPerTrial)
> which calls the existing cluster() method numTrials times, returning the best 
> result.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to