I put up a patch, do you think that it looks reasonable?  I'm not totally 
thrilled by it, but it is a start.

On a related note, is there any reason why the input seeds can't be Vectors as 
an alternative to Cluster?

-Grant

On Jul 13, 2011, at 5:38 PM, Jeff Eastman wrote:

> Mostly. Clustering assigns points to one or more clusters, and it uses the 
> distance measure or model pdf to do this. So the distance from each point to 
> the cluster center is calculated in this step but thrown away once the 
> assignment(s) is(are) made. This information could be output to another file 
> or a different version could output the distance directly instead of the pdf. 
> I don't know what that would mean for Dirichlet; however, since it only plays 
> with pdf values.
> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:[email protected]] 
> Sent: Wednesday, July 13, 2011 1:36 PM
> To: [email protected]
> Subject: Re: Emitting distance from centroid for K-Means
> 
> Isn't --clustering the post processing step that already does it?
> 
> On Jul 13, 2011, at 4:31 PM, Jeff Eastman wrote:
> 
>> Well, distance is dependent upon the distance measure you want to use. A 
>> post-processing step could easily calculate this. The ClusterEvaluator may 
>> have some methods that could be useful. It calculates a set of 
>> representative points for each cluster and calculates interCluster and 
>> intraCluster densities from that. 
>> 
>> -----Original Message-----
>> From: Grant Ingersoll [mailto:[email protected]] 
>> Sent: Wednesday, July 13, 2011 1:28 PM
>> To: [email protected]
>> Subject: Re: Emitting distance from centroid for K-Means
>> 
>> Good to know.  Next question, what's the preferred way, then, to get out 
>> either the distance or what Ted said?
>> 
>> -Grant
>> 
>> On Jul 13, 2011, at 4:25 PM, Ted Dunning wrote:
>> 
>>> I take back what I said.
>>> 
>>> Jeff is correct.
>>> 
>>> On Wed, Jul 13, 2011 at 1:23 PM, Jeff Eastman <[email protected]> wrote:
>>> 
>>>> The weight is the probability the vector is a member of the cluster. For
>>>> FuzzyK and Dirichlet it is fractional, for KMeans it is 1 as the algorithm
>>>> is maximum likelihood and each point is only assigned to a single cluster.
>>>> 
>>>> -----Original Message-----
>>>> From: Grant Ingersoll [mailto:[email protected]]
>>>> Sent: Wednesday, July 13, 2011 1:11 PM
>>>> To: [email protected]
>>>> Subject: Emitting distance from centroid for K-Means
>>>> 
>>>> Does it make sense to output the distance to the cluster as the weight in
>>>> the KMeansClusterer.outputPointWithClusterInfo method instead of 1?  What's
>>>> the purpose of the 1 as the weight?
>>>> 
>>>> -Grant
>>>> 
>>>> 
>>>> 
>> 
>> --------------------------
>> Grant Ingersoll
>> 
>> 
>> 
> 
> --------------------------
> Grant Ingersoll
> 
> 
> 

--------------------------
Grant Ingersoll



Reply via email to