Hi, +1 for option 2 based on intuition - I'm not aware of any theoretical recommendations. Option 4 would be to implement the different strategies, but that would probably be overkill for now.
Cheers, Mikkel. 2010/10/22 Luc Maisonobe (JIRA) <[email protected]>: > > [ > https://issues.apache.org/jira/browse/MATH-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923906#action_12923906 > ] > > Luc Maisonobe edited comment on MATH-429 at 10/22/10 12:48 PM: > --------------------------------------------------------------- > > You have encountered one classical problem with k-means: at some stage (here > at the first iteration), one of the clusters becomes empty. > This case is currently no handled by commons-math (which is a bug, so we have > to fix it). > When a cluster is empty, a new centroid must be defined from the other > clusters. There are different strategies: > > # take the point farthest from any cluster > # select a random point from the cluster with the largest distance variance > # select a random point from the cluster with the largest number of points > > My prefered choice would be 2, what do other people think ? > > > was (Author: luc): > You have encountered one classical proble with k-means: at some stage > (here at the first iteration), one of the clusters becomes empty. > This case is currently no handled by commons-math (which is a bug, so we have > to fix it). > When a cluster is empty, a new centroid must be defined from the other > clusters. There are different strategies: > > # take the point farthest from any cluster > # select a random point from the cluster with the largest distance variance > # select a random point from the cluster with the largest number of points > > My prefered choice would be 2, what do other people think ? > > >> KMeansPlusPlusClusterer breaks by division by zero >> -------------------------------------------------- >> >> Key: MATH-429 >> URL: https://issues.apache.org/jira/browse/MATH-429 >> Project: Commons Math >> Issue Type: Bug >> Affects Versions: 2.1 >> Environment: Java, Windows >> Reporter: Erik van Ingen >> Priority: Blocker >> Attachments: KMeansPlusPlusClustererTest.java >> >> Original Estimate: 3h >> Remaining Estimate: 3h >> >> For a certain space, KMeansPlusPlusClusterer breaks. This is a blocker >> because this space occurs in our domain. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >
