I put up a patch, do you think that it looks reasonable? I'm not totally thrilled by it, but it is a start.
On a related note, is there any reason why the input seeds can't be Vectors as an alternative to Cluster? -Grant On Jul 13, 2011, at 5:38 PM, Jeff Eastman wrote: > Mostly. Clustering assigns points to one or more clusters, and it uses the > distance measure or model pdf to do this. So the distance from each point to > the cluster center is calculated in this step but thrown away once the > assignment(s) is(are) made. This information could be output to another file > or a different version could output the distance directly instead of the pdf. > I don't know what that would mean for Dirichlet; however, since it only plays > with pdf values. > > -----Original Message----- > From: Grant Ingersoll [mailto:[email protected]] > Sent: Wednesday, July 13, 2011 1:36 PM > To: [email protected] > Subject: Re: Emitting distance from centroid for K-Means > > Isn't --clustering the post processing step that already does it? > > On Jul 13, 2011, at 4:31 PM, Jeff Eastman wrote: > >> Well, distance is dependent upon the distance measure you want to use. A >> post-processing step could easily calculate this. The ClusterEvaluator may >> have some methods that could be useful. It calculates a set of >> representative points for each cluster and calculates interCluster and >> intraCluster densities from that. >> >> -----Original Message----- >> From: Grant Ingersoll [mailto:[email protected]] >> Sent: Wednesday, July 13, 2011 1:28 PM >> To: [email protected] >> Subject: Re: Emitting distance from centroid for K-Means >> >> Good to know. Next question, what's the preferred way, then, to get out >> either the distance or what Ted said? >> >> -Grant >> >> On Jul 13, 2011, at 4:25 PM, Ted Dunning wrote: >> >>> I take back what I said. >>> >>> Jeff is correct. >>> >>> On Wed, Jul 13, 2011 at 1:23 PM, Jeff Eastman <[email protected]> wrote: >>> >>>> The weight is the probability the vector is a member of the cluster. For >>>> FuzzyK and Dirichlet it is fractional, for KMeans it is 1 as the algorithm >>>> is maximum likelihood and each point is only assigned to a single cluster. >>>> >>>> -----Original Message----- >>>> From: Grant Ingersoll [mailto:[email protected]] >>>> Sent: Wednesday, July 13, 2011 1:11 PM >>>> To: [email protected] >>>> Subject: Emitting distance from centroid for K-Means >>>> >>>> Does it make sense to output the distance to the cluster as the weight in >>>> the KMeansClusterer.outputPointWithClusterInfo method instead of 1? What's >>>> the purpose of the 1 as the weight? >>>> >>>> -Grant >>>> >>>> >>>> >> >> -------------------------- >> Grant Ingersoll >> >> >> > > -------------------------- > Grant Ingersoll > > > -------------------------- Grant Ingersoll
