On Sat, Jun 27, 2009 at 8:10 AM, Grant Ingersoll<[email protected]> wrote: > > On Jun 26, 2009, at 10:42 PM, Grant Ingersoll wrote: > >> >> The semantics of constructing a Cluster are odd to me. Do I always have >> to immediately add a point to the Cluster in order for it to be "real", >> despite the fact that I added a Center? Isn't adding a Center effectively >> giving the Cluster one point? >>
Perhaps I misunderstood you, but I think that by assigning a new point (by calling addPoint(Vector)) to a Cluster does not mean you are "adding a center". A center is specified at the beginning of the algorithm and every iteration, after including a set of new points, recalculates that center by determining a new means - which is now the centroid of that particular Cluster. So, clearly, the center itself is a proper point in the Cluster and you don't need to add it after being selected as that in order for it to be "real". > And if you add the center, why isn't it the centroid until other points are > added? > Again, the centroid is the result of a recalculation of a means and may or may not be a real point. By having just one point in a Cluster - that is to say, its center - there's no "recalculation" to be done. Conceptually, you could say the centroid lies, in fact, in the center - though, it's not relevant to the algorithm. A final example. Let's say you create a Cluster C with point (1,1) as its center. Then, you add (3,3) to it. Cluster C: (1,1);(3,3) - original center: (1,1) - centroid: (2,2) Now, you create another Cluster C' with the same center, but decide to add the point again. Then, (3,3) is added. Cluster C': (1,1);(1,1);(3,3) - original center: (1,1) - centroid (5/3, 5/3). Ok, that was an unnecesary example. Got it. But it shows that C and C' are not the same cluster, based on the fact that point repetition contribute to a general means.
