On Jun 24, 2009, at 7:20 PM, Grant Ingersoll wrote:


On Jun 24, 2009, at 6:53 PM, Ted Dunning wrote:

Grant,

This optimization should have made a large difference.  Did it?

Yes. Still quantifying, but very promising. Still having a hard time finding good t1, t2 values for the simple tests I am running clustering Wikipedia data, so that is clouding things. It seems no matter what I pick, I get one vector per canopy. Obviously, something is wrong, but I don't know what. Sigh. Of course, it could be the fact the docs I'm clustering aren't related, I guess. I'm only doing the first 1000 from a dump. I'll try a bigger version now.

Switching to the CosineDistanceMeasure has improved things. Sorry for the thread highjack.


All the tests pass with the changes, though, and I had the same problem before.


The triangle inequality trick should help by a factor of two or more as
well.



Reply via email to