KMeansPlusPlusClusterer should run multiple trials
--------------------------------------------------

                 Key: MATH-548
                 URL: https://issues.apache.org/jira/browse/MATH-548
             Project: Commons Math
          Issue Type: Improvement
            Reporter: Nate Paymer
            Priority: Minor


The interface and documentation for KMeansPlusPlusClusterer imply that a single 
call to cluster() is sufficient to get the optimal set of clusters.  But this 
isn't true -- practically every client should be calling cluster() multiple 
times, selecting the best resulting set of clusters.  It seems to me that 
rather than forcing every client to implement this functionality, it should be 
placed directly in the KMeansPlusPlusClusterer class.

I propose adding a new method to KMeansPlusPlusClusterer:
  List<Cluster<T>> cluster(Collection<T> points, int k, int numTrials, int 
maxIterationsPerTrial)
which calls the existing cluster() method numTrials times, returning the best 
result.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to