Mostly. Clustering assigns points to one or more clusters, and it uses the 
distance measure or model pdf to do this. So the distance from each point to 
the cluster center is calculated in this step but thrown away once the 
assignment(s) is(are) made. This information could be output to another file or 
a different version could output the distance directly instead of the pdf. I 
don't know what that would mean for Dirichlet; however, since it only plays 
with pdf values.

-----Original Message-----
From: Grant Ingersoll [mailto:[email protected]] 
Sent: Wednesday, July 13, 2011 1:36 PM
To: [email protected]
Subject: Re: Emitting distance from centroid for K-Means

Isn't --clustering the post processing step that already does it?

On Jul 13, 2011, at 4:31 PM, Jeff Eastman wrote:

> Well, distance is dependent upon the distance measure you want to use. A 
> post-processing step could easily calculate this. The ClusterEvaluator may 
> have some methods that could be useful. It calculates a set of representative 
> points for each cluster and calculates interCluster and intraCluster 
> densities from that. 
> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:[email protected]] 
> Sent: Wednesday, July 13, 2011 1:28 PM
> To: [email protected]
> Subject: Re: Emitting distance from centroid for K-Means
> 
> Good to know.  Next question, what's the preferred way, then, to get out 
> either the distance or what Ted said?
> 
> -Grant
> 
> On Jul 13, 2011, at 4:25 PM, Ted Dunning wrote:
> 
>> I take back what I said.
>> 
>> Jeff is correct.
>> 
>> On Wed, Jul 13, 2011 at 1:23 PM, Jeff Eastman <[email protected]> wrote:
>> 
>>> The weight is the probability the vector is a member of the cluster. For
>>> FuzzyK and Dirichlet it is fractional, for KMeans it is 1 as the algorithm
>>> is maximum likelihood and each point is only assigned to a single cluster.
>>> 
>>> -----Original Message-----
>>> From: Grant Ingersoll [mailto:[email protected]]
>>> Sent: Wednesday, July 13, 2011 1:11 PM
>>> To: [email protected]
>>> Subject: Emitting distance from centroid for K-Means
>>> 
>>> Does it make sense to output the distance to the cluster as the weight in
>>> the KMeansClusterer.outputPointWithClusterInfo method instead of 1?  What's
>>> the purpose of the 1 as the weight?
>>> 
>>> -Grant
>>> 
>>> 
>>> 
> 
> --------------------------
> Grant Ingersoll
> 
> 
> 

--------------------------
Grant Ingersoll



Reply via email to