Thanks for the replies guys!

@Aljoscha: I get your point, but I would actually expect either an error
message or the lonely centroid to move.
k-means is supposed to cluster data in k clusters. If you end up with < k,
something must have gone wrong.. :s

@Alex: very helpful resource, thanks. I will probably use one of these
strategies.

Cheers,
V.

On 25 February 2015 at 15:36, Alexander Alexandrov <
alexander.s.alexand...@gmail.com> wrote:

> Apache's commons-math implementation offers various strategies for handling
> this scenarios:
>
>
> http://commons.apache.org/proper/commons-math/jacoco/org.apache.commons.math3.stat.clustering/KMeansPlusPlusClusterer.java.html
>
> (take a look at the EmptyClusterStrategy enum options)
>
> 2015-02-24 23:28 GMT+01:00 Aljoscha Krettek <aljos...@apache.org>:
>
> > I think the behaviour is correct. If a cluster has not points then it
> > has no centroid. If it has no centroid no points could ever be
> > assigned to it again in the future since there is no way of
> > calculating a distance.
> >
> > On Tue, Feb 24, 2015 at 6:57 PM, Vasiliki Kalavri
> > <vasilikikala...@gmail.com> wrote:
> > > Hello everyone,
> > >
> > > I'm using the k-means example as basis for a custom implementation and
> I
> > > noticed the following behavior: If during an iteration no point is
> > assigned
> > > to a particular cluster, this cluster will then "disappear".
> > > This happens because SelectNearestCenter() outputs <centroidId, point>
> > > tuples, (where centroidId is the chosen center by the point) and these
> > are
> > > then grouped by centroidId to compute the new centers. If no point
> > selects
> > > a particular centroid, this centroid will not appear in subsequent
> > > iterations.
> > >
> > > For example, assume we have the points
> > > { (-10, 0), (-8, 0), (2, 0) } and the initial centroids {1, (0, 0)} and
> > {2,
> > > (5, 0)}.
> > > Initially, point (2, 0) will be assigned to centroid 1, but then after
> > > centroid 1 moves closer to (-10, 0) point(2, 0) will not be reassigned
> to
> > > cluster 2.
> > >
> > > Is this intended behavior?
> > > This seemed odd to me, but I couldn't really find any resources that
> > define
> > > the "correct" behavior.. It seems that handling such a situation is
> > > implementation-specific. I think that if we keep it this way, we might
> > want
> > > to add a comment in the example though :)
> > >
> > > Cheers,
> > > V.
> >
>

Reply via email to