[ 
https://issues.apache.org/jira/browse/MATH-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923906#action_12923906
 ] 

Luc Maisonobe edited comment on MATH-429 at 10/22/10 12:48 PM:
---------------------------------------------------------------

You have encountered one classical problem with k-means: at some stage (here at 
the first iteration), one of the clusters becomes empty.
This case is currently no handled by commons-math (which is a bug, so we have 
to fix it).
When a cluster is empty, a new centroid must be defined from the other 
clusters. There are different strategies:

# take the point farthest from any cluster
# select a random point from the cluster with the largest distance variance
# select a random point from the cluster with the largest number of points

My prefered choice would be 2, what do other people think ?


      was (Author: luc):
    You have encountered one classical proble with k-means: at some stage (here 
at the first iteration), one of the clusters becomes empty.
This case is currently no handled by commons-math (which is a bug, so we have 
to fix it).
When a cluster is empty, a new centroid must be defined from the other 
clusters. There are different strategies:

# take the point farthest from any cluster
# select a random point from the cluster with the largest distance variance
# select a random point from the cluster with the largest number of points

My prefered choice would be 2, what do other people think ?

  
> KMeansPlusPlusClusterer breaks by division by zero
> --------------------------------------------------
>
>                 Key: MATH-429
>                 URL: https://issues.apache.org/jira/browse/MATH-429
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.1
>         Environment: Java, Windows
>            Reporter: Erik van Ingen
>            Priority: Blocker
>         Attachments: KMeansPlusPlusClustererTest.java
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> For a certain space, KMeansPlusPlusClusterer  breaks. This is a blocker 
> because this space occurs in our domain. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to