-----Original Message-----
From: Grant Ingersoll [mailto:[email protected]] 
Sent: Wednesday, July 13, 2011 5:35 PM
To: [email protected]
Subject: Re: Emitting distance from centroid for K-Means


On Jul 13, 2011, at 6:42 PM, Jeff Eastman wrote:

> +1 Patch looks reasonable enough. You'd need to modify the other clustering 
> algorithms to achieve uniformity.

Not sure if it needs uniformity, but I can.  As you pointed out, some of the 
other implementations don't have the same info, so they need not go to the 
trouble of doing it.   Also, the change is only on output of --clustering, so 
it shouldn't effect the iterations, right?
[jeff] Right, uniformity where appropriate would also be nice.

> 
> The assumption about input seeds originally came from using Canopy to prime 
> KMeans but it has become the prior set of clusters since the algorithms have 
> converged on common formats & models. Each iteration reads in the set of 
> clusters-n and outputs clusters-n+1, so changing this would have broad 
> impact. FuzzyK and Dirichlet use the same iteration semantics and the 
> ClusterIterator depends on this for unification with classification 
> interfaces.



> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:[email protected]] 
> Sent: Wednesday, July 13, 2011 3:08 PM
> To: [email protected]
> Subject: Re: Emitting distance from centroid for K-Means
> 
> I put up a patch, do you think that it looks reasonable?  I'm not totally 
> thrilled by it, but it is a start.
> 
> On a related note, is there any reason why the input seeds can't be Vectors 
> as an alternative to Cluster?
> 
> -Grant
> 
> On Jul 13, 2011, at 5:38 PM, Jeff Eastman wrote:
> 
>> Mostly. Clustering assigns points to one or more clusters, and it uses the 
>> distance measure or model pdf to do this. So the distance from each point to 
>> the cluster center is calculated in this step but thrown away once the 
>> assignment(s) is(are) made. This information could be output to another file 
>> or a different version could output the distance directly instead of the 
>> pdf. I don't know what that would mean for Dirichlet; however, since it only 
>> plays with pdf values.
>> 
>> -----Original Message-----
>> From: Grant Ingersoll [mailto:[email protected]] 
>> Sent: Wednesday, July 13, 2011 1:36 PM
>> To: [email protected]
>> Subject: Re: Emitting distance from centroid for K-Means
>> 
>> Isn't --clustering the post processing step that already does it?
>> 
>> On Jul 13, 2011, at 4:31 PM, Jeff Eastman wrote:
>> 
>>> Well, distance is dependent upon the distance measure you want to use. A 
>>> post-processing step could easily calculate this. The ClusterEvaluator may 
>>> have some methods that could be useful. It calculates a set of 
>>> representative points for each cluster and calculates interCluster and 
>>> intraCluster densities from that. 
>>> 
>>> -----Original Message-----
>>> From: Grant Ingersoll [mailto:[email protected]] 
>>> Sent: Wednesday, July 13, 2011 1:28 PM
>>> To: [email protected]
>>> Subject: Re: Emitting distance from centroid for K-Means
>>> 
>>> Good to know.  Next question, what's the preferred way, then, to get out 
>>> either the distance or what Ted said?
>>> 
>>> -Grant
>>> 
>>> On Jul 13, 2011, at 4:25 PM, Ted Dunning wrote:
>>> 
>>>> I take back what I said.
>>>> 
>>>> Jeff is correct.
>>>> 
>>>> On Wed, Jul 13, 2011 at 1:23 PM, Jeff Eastman <[email protected]> wrote:
>>>> 
>>>>> The weight is the probability the vector is a member of the cluster. For
>>>>> FuzzyK and Dirichlet it is fractional, for KMeans it is 1 as the algorithm
>>>>> is maximum likelihood and each point is only assigned to a single cluster.
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Grant Ingersoll [mailto:[email protected]]
>>>>> Sent: Wednesday, July 13, 2011 1:11 PM
>>>>> To: [email protected]
>>>>> Subject: Emitting distance from centroid for K-Means
>>>>> 
>>>>> Does it make sense to output the distance to the cluster as the weight in
>>>>> the KMeansClusterer.outputPointWithClusterInfo method instead of 1?  
>>>>> What's
>>>>> the purpose of the 1 as the weight?
>>>>> 
>>>>> -Grant
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> --------------------------
>>> Grant Ingersoll
>>> 
>>> 
>>> 
>> 
>> --------------------------
>> Grant Ingersoll
>> 
>> 
>> 
> 
> --------------------------
> Grant Ingersoll
> 
> 
> 

--------------------------
Grant Ingersoll



Reply via email to