On Sat, Jun 27, 2009 at 8:10 AM, Grant
Ingersoll<[email protected]> wrote:
On Jun 26, 2009, at 10:42 PM, Grant Ingersoll wrote:
The semantics of constructing a Cluster are odd to me. Do I
always have
to immediately add a point to the Cluster in order for it to be
"real",
despite the fact that I added a Center? Isn't adding a Center
effectively
giving the Cluster one point?
Perhaps I misunderstood you, but I think that by assigning a new
point
(by calling addPoint(Vector)) to a Cluster does not mean you are
"adding a center". A center is specified at the beginning of the
algorithm and every iteration, after including a set of new points,
recalculates that center by determining a new means - which is now
the
centroid of that particular Cluster. So, clearly, the center itself
is
a proper point in the Cluster and you don't need to add it after
being
selected as that in order for it to be "real".
And if you add the center, why isn't it the centroid until other
points are
added?
Again, the centroid is the result of a recalculation of a means and
may or may not be a real point. By having just one point in a Cluster
- that is to say, its center - there's no "recalculation" to be done.
Conceptually, you could say the centroid lies, in fact, in the center
- though, it's not relevant to the algorithm.
A final example. Let's say you create a Cluster C with point (1,1) as
its center. Then, you add (3,3) to it.
Cluster C: (1,1);(3,3) - original center: (1,1) - centroid: (2,2)
Now, you create another Cluster C' with the same center, but decide
to
add the point again. Then, (3,3) is added.
Cluster C': (1,1);(1,1);(3,3) - original center: (1,1) - centroid
(5/3, 5/3).
Ok, that was an unnecesary example. Got it. But it shows that C and
C'
are not the same cluster, based on the fact that point repetition
contribute to a general means.