Thanks for the replies guys! @Aljoscha: I get your point, but I would actually expect either an error message or the lonely centroid to move. k-means is supposed to cluster data in k clusters. If you end up with < k, something must have gone wrong.. :s
@Alex: very helpful resource, thanks. I will probably use one of these strategies. Cheers, V. On 25 February 2015 at 15:36, Alexander Alexandrov < alexander.s.alexand...@gmail.com> wrote: > Apache's commons-math implementation offers various strategies for handling > this scenarios: > > > http://commons.apache.org/proper/commons-math/jacoco/org.apache.commons.math3.stat.clustering/KMeansPlusPlusClusterer.java.html > > (take a look at the EmptyClusterStrategy enum options) > > 2015-02-24 23:28 GMT+01:00 Aljoscha Krettek <aljos...@apache.org>: > > > I think the behaviour is correct. If a cluster has not points then it > > has no centroid. If it has no centroid no points could ever be > > assigned to it again in the future since there is no way of > > calculating a distance. > > > > On Tue, Feb 24, 2015 at 6:57 PM, Vasiliki Kalavri > > <vasilikikala...@gmail.com> wrote: > > > Hello everyone, > > > > > > I'm using the k-means example as basis for a custom implementation and > I > > > noticed the following behavior: If during an iteration no point is > > assigned > > > to a particular cluster, this cluster will then "disappear". > > > This happens because SelectNearestCenter() outputs <centroidId, point> > > > tuples, (where centroidId is the chosen center by the point) and these > > are > > > then grouped by centroidId to compute the new centers. If no point > > selects > > > a particular centroid, this centroid will not appear in subsequent > > > iterations. > > > > > > For example, assume we have the points > > > { (-10, 0), (-8, 0), (2, 0) } and the initial centroids {1, (0, 0)} and > > {2, > > > (5, 0)}. > > > Initially, point (2, 0) will be assigned to centroid 1, but then after > > > centroid 1 moves closer to (-10, 0) point(2, 0) will not be reassigned > to > > > cluster 2. > > > > > > Is this intended behavior? > > > This seemed odd to me, but I couldn't really find any resources that > > define > > > the "correct" behavior.. It seems that handling such a situation is > > > implementation-specific. I think that if we keep it this way, we might > > want > > > to add a comment in the example though :) > > > > > > Cheers, > > > V. > > >