On Jul 27, 2009, at 12:55 PM, Shashikant Kore wrote:
On Mon, Jul 27, 2009 at 10:11 PM, Grant
Ingersoll<[email protected]> wrote:
Not following. The distance calc stuff is irrespective of the type
of
Vector. I was referring to the centroid length square (I think you
called
it the triangle inequality) stuff that Shashikant added on
MAHOUT-121. We
use it for testing convergence, but not for other distance
calculations. I
haven't looked to see if it is applicable yet, but it seems like it
should
be.
Grant,
Yes, that part of the patch is missing. In my original patch, I had
modified the emitPointToNearestCluster() in kmeans/Cluster.java to
calculate distance between document and centroids of various clusters.
(There is no triangle inequality code, though.) In the later patches
I don't see that code.
I had reviewed the final patch, but I missed out on this one. I
think, I only ran Canopy and not K-means. Incidentally, I am
hopelessly out of date with trunk as recently I have not worked on
this. BTW, I haven't really followed this thread in depth. So, I
might be speaking out of context here. Apologies.
I'll be on a plane tomorrow, will see if I can track down the
differences.
-Grant