The DistanceMeasure instance in each DistanceMeasureCluster is used to compute 
a pdf() that is required by the Model interface. In order to calculate a pdf 
function with respect to an arbitrary vector, the distance measure used to form 
the cluster in the first place is needed. The ClusterClassifier uses pdf() to 
classify a Vector but it is not otherwise used by kmeans. The measure itself 
contains no state so it is not "the distance to the cluster center". It can be 
used; however, to calculate the distance of an arbitrary point from the cluster 
center as is done in pdf().

Hope this helps,
Jeff


-----Original Message-----
From: djellel eddine Difallah [mailto:[email protected]] 
Sent: Friday, June 17, 2011 1:16 PM
To: [email protected]
Subject: Re: kmeans generates ovelapping clusters

How about that measure that is now appended to the points? do you confirm
that it is the distance to the cluster center?

Thanks

2011/6/17 Jeff Eastman <[email protected]>

> It is not unreasonable for the cluster centers to contain nonzero values
> for many terms. Consider a 2-d, x-y clustering as in the DisplayKMeans
> example. Every cluster center contains nonzero x and y values. The centers
> themselves, taken as a whole, should be distinct and one would not expect
> their 3sigma circles to overlap much. Certainly, the clustered points output
> by the classification step will be assigned to a single cluster, since
> kmeans is a maximum likelihood clustering algorithm.
>

the ouptut in my case, generated with clusterdump, put the same point in my
two clusters ...


>
> -----Original Message-----
> From: djellel eddine Difallah [mailto:[email protected]]
> Sent: Friday, June 17, 2011 11:45 AM
> To: [email protected]
> Subject: Re: kmeans generates ovelapping clusters
>
> Ok. However, if the measure appended to each term has something to do with
> the distance then I have found that it is different for terms that are in
> both cluster. Example:
> Cl1 { .... animal:0.001, ....}
> Cl2 { .... animal:0.07, ....}
>
> What does it mean exactly ?
>
> 2011/6/17 Hector Yee <[email protected]>
>
> > One vector be a member of only one cluster but there's no requirement for
> > no
> > overlaps.
> > You get equal radius but the cluster centers could be close enough for
> them
> > to overlap.
> >
> > On Fri, Jun 17, 2011 at 10:15 AM, djellel eddine Difallah <
> > [email protected]> wrote:
> >
> > > Hello everyone,
> > >
> > > I tried kmeans on some corpus with the same script as reuters but with
> -k
> > 2
> > >
> > > There are some terms in both generated clusters. In addition terms in a
> > > cluster have a measure  .. somthing like   { .... animal:0.087,
> > boat:3.559,
> > > kitty:3.386, .....}
> > >
> > > Isn't kmeans supposed to generate non overlapping clusters? and what
> does
> > > that annotated measures mean?
> > >
> > > Thanks !
> > >
> > > Djellel
> > >
> >
> >
> >
> > --
> > Yee Yang Li Hector
> > http://hectorgon.blogspot.com/ (tech + travel)
> > http://hectorgon.com (book reviews)
> >
>

Reply via email to