Hi Yuval,

I haven't looked, but I don't want to leave you hanging.  This is definitely 
something we should check on and you may very well have found a bug.  Perhaps 
you can write up a test case?  I will try to look at this soon, if someone else 
doesn't beat me to it.

-Grant
On Jan 22, 2011, at 1:56 AM, Yuval Merhav wrote:

> Hi,
> 
> I'm doing some experiments with Kmeans and I have a few doubts regarding the
> way the cluster size
> is computed (related to other clustering algorithms as well).
> 
> 1. AbstractCluster stores the number of points. It looks like that the
> method computeParameters() uses "s0" to determine the number of points.
> "s0" is computed based on the weight we assign for a point; the default is
> 1.0 so there's no problem. However, if we modify the weight then the
> number of points would be off; wouldn't it? is that intentional?
> 
> 2. Regardless of (1), it seems that the cluster dumper does not always print
> the right number of points for a cluster. I didn't look into
> it too much yet, but my first guess would be that "numPoints" in
> AbstractCluster refers to the number of points in the cluster for the given
> iteration,
> which is what the dumper prints, while the actual number of points for a
> given cluster might change after the actual assignments of points to
> clusters
> are done. I will look into it further but if you have any pointers that
> would save me time.
> The ClusterLabels class computes the number of points in a cluster from the
> actual clusteredPoints directory and gets it right.
> 
> Thanks!
> 
> -- 
> Yuval

Reply via email to