Hard to dispute that! This definitely does not sound like a theory problem so much as simple implementation woes.
On Sun, Jun 28, 2009 at 2:55 PM, Grant Ingersoll <[email protected]>wrote: > On Jun 28, 2009, at 4:56 PM, Grant Ingersoll wrote: > > I get all of this, my point is that when you rehydrate the Cluster, it >> doesn't properly report the centroid per my email all because numPoints == 0 >> and pointTotal is a a vector that is the same as the passed in center >> vector, but initialized to 0. >> >> > In other words, the simple act of serializing a Cluster to HDFS and then > reconstituting it should not alter the result one gets, which I believe is > what happens if one dumps out the clusters that have been calculated after > the whole process is done. >
