Hi Yuval, I haven't looked, but I don't want to leave you hanging. This is definitely something we should check on and you may very well have found a bug. Perhaps you can write up a test case? I will try to look at this soon, if someone else doesn't beat me to it.
-Grant On Jan 22, 2011, at 1:56 AM, Yuval Merhav wrote: > Hi, > > I'm doing some experiments with Kmeans and I have a few doubts regarding the > way the cluster size > is computed (related to other clustering algorithms as well). > > 1. AbstractCluster stores the number of points. It looks like that the > method computeParameters() uses "s0" to determine the number of points. > "s0" is computed based on the weight we assign for a point; the default is > 1.0 so there's no problem. However, if we modify the weight then the > number of points would be off; wouldn't it? is that intentional? > > 2. Regardless of (1), it seems that the cluster dumper does not always print > the right number of points for a cluster. I didn't look into > it too much yet, but my first guess would be that "numPoints" in > AbstractCluster refers to the number of points in the cluster for the given > iteration, > which is what the dumper prints, while the actual number of points for a > given cluster might change after the actual assignments of points to > clusters > are done. I will look into it further but if you have any pointers that > would save me time. > The ClusterLabels class computes the number of points in a cluster from the > actual clusteredPoints directory and gets it right. > > Thanks! > > -- > Yuval
