Hi,

+1 for option 2 based on intuition - I'm not aware of any theoretical
recommendations. Option 4 would be to implement the different
strategies, but that would probably be overkill for now.

Cheers, Mikkel.

2010/10/22 Luc Maisonobe (JIRA) <[email protected]>:
>
>    [ 
> https://issues.apache.org/jira/browse/MATH-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923906#action_12923906
>  ]
>
> Luc Maisonobe edited comment on MATH-429 at 10/22/10 12:48 PM:
> ---------------------------------------------------------------
>
> You have encountered one classical problem with k-means: at some stage (here 
> at the first iteration), one of the clusters becomes empty.
> This case is currently no handled by commons-math (which is a bug, so we have 
> to fix it).
> When a cluster is empty, a new centroid must be defined from the other 
> clusters. There are different strategies:
>
> # take the point farthest from any cluster
> # select a random point from the cluster with the largest distance variance
> # select a random point from the cluster with the largest number of points
>
> My prefered choice would be 2, what do other people think ?
>
>
>      was (Author: luc):
>    You have encountered one classical proble with k-means: at some stage 
> (here at the first iteration), one of the clusters becomes empty.
> This case is currently no handled by commons-math (which is a bug, so we have 
> to fix it).
> When a cluster is empty, a new centroid must be defined from the other 
> clusters. There are different strategies:
>
> # take the point farthest from any cluster
> # select a random point from the cluster with the largest distance variance
> # select a random point from the cluster with the largest number of points
>
> My prefered choice would be 2, what do other people think ?
>
>
>> KMeansPlusPlusClusterer breaks by division by zero
>> --------------------------------------------------
>>
>>                 Key: MATH-429
>>                 URL: https://issues.apache.org/jira/browse/MATH-429
>>             Project: Commons Math
>>          Issue Type: Bug
>>    Affects Versions: 2.1
>>         Environment: Java, Windows
>>            Reporter: Erik van Ingen
>>            Priority: Blocker
>>         Attachments: KMeansPlusPlusClustererTest.java
>>
>>   Original Estimate: 3h
>>  Remaining Estimate: 3h
>>
>> For a certain space, KMeansPlusPlusClusterer  breaks. This is a blocker 
>> because this space occurs in our domain.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Reply via email to