[
https://issues.apache.org/jira/browse/MATH-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923906#action_12923906
]
Luc Maisonobe commented on MATH-429:
------------------------------------
You have encountered one classical proble with k-means: at some stage (here at
the first iteration), one of the clusters becomes empty.
This case is currently no handled by commons-math (which is a bug, so we have
to fix it).
When a cluster is empty, a new centroid must be defined from the other
clusters. There are different strategies:
# take the point farthest from any cluster
# select a random point from the cluster with the largest distance variance
# select a random point from the cluster with the largest number of points
My prefered choice would be 2, what do other people think ?
> KMeansPlusPlusClusterer breaks by division by zero
> --------------------------------------------------
>
> Key: MATH-429
> URL: https://issues.apache.org/jira/browse/MATH-429
> Project: Commons Math
> Issue Type: Bug
> Affects Versions: 2.1
> Environment: Java, Windows
> Reporter: Erik van Ingen
> Priority: Blocker
> Attachments: KMeansPlusPlusClustererTest.java
>
> Original Estimate: 3h
> Remaining Estimate: 3h
>
> For a certain space, KMeansPlusPlusClusterer breaks. This is a blocker
> because this space occurs in our domain.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.