Hi,

I'm doing some experiments with Kmeans and I have a few doubts regarding the
way the cluster size
is computed (related to other clustering algorithms as well).

1. AbstractCluster stores the number of points. It looks like that the
method computeParameters() uses "s0" to determine the number of points.
"s0" is computed based on the weight we assign for a point; the default is
1.0 so there's no problem. However, if we modify the weight then the
number of points would be off; wouldn't it? is that intentional?

2. Regardless of (1), it seems that the cluster dumper does not always print
the right number of points for a cluster. I didn't look into
it too much yet, but my first guess would be that "numPoints" in
AbstractCluster refers to the number of points in the cluster for the given
iteration,
which is what the dumper prints, while the actual number of points for a
given cluster might change after the actual assignments of points to
clusters
are done. I will look into it further but if you have any pointers that
would save me time.
The ClusterLabels class computes the number of points in a cluster from the
actual clusteredPoints directory and gets it right.

Thanks!

-- 
Yuval

Reply via email to