It is not unreasonable for the cluster centers to contain nonzero values for many terms. Consider a 2-d, x-y clustering as in the DisplayKMeans example. Every cluster center contains nonzero x and y values. The centers themselves, taken as a whole, should be distinct and one would not expect their 3sigma circles to overlap much. Certainly, the clustered points output by the classification step will be assigned to a single cluster, since kmeans is a maximum likelihood clustering algorithm.
-----Original Message----- From: djellel eddine Difallah [mailto:[email protected]] Sent: Friday, June 17, 2011 11:45 AM To: [email protected] Subject: Re: kmeans generates ovelapping clusters Ok. However, if the measure appended to each term has something to do with the distance then I have found that it is different for terms that are in both cluster. Example: Cl1 { .... animal:0.001, ....} Cl2 { .... animal:0.07, ....} What does it mean exactly ? 2011/6/17 Hector Yee <[email protected]> > One vector be a member of only one cluster but there's no requirement for > no > overlaps. > You get equal radius but the cluster centers could be close enough for them > to overlap. > > On Fri, Jun 17, 2011 at 10:15 AM, djellel eddine Difallah < > [email protected]> wrote: > > > Hello everyone, > > > > I tried kmeans on some corpus with the same script as reuters but with -k > 2 > > > > There are some terms in both generated clusters. In addition terms in a > > cluster have a measure .. somthing like { .... animal:0.087, > boat:3.559, > > kitty:3.386, .....} > > > > Isn't kmeans supposed to generate non overlapping clusters? and what does > > that annotated measures mean? > > > > Thanks ! > > > > Djellel > > > > > > -- > Yee Yang Li Hector > http://hectorgon.blogspot.com/ (tech + travel) > http://hectorgon.com (book reviews) >
