On Jun 24, 2009, at 7:20 PM, Grant Ingersoll wrote:
On Jun 24, 2009, at 6:53 PM, Ted Dunning wrote:
Grant,
This optimization should have made a large difference. Did it?
Yes. Still quantifying, but very promising. Still having a hard
time finding good t1, t2 values for the simple tests I am running
clustering Wikipedia data, so that is clouding things. It seems no
matter what I pick, I get one vector per canopy. Obviously,
something is wrong, but I don't know what. Sigh. Of course, it
could be the fact the docs I'm clustering aren't related, I guess.
I'm only doing the first 1000 from a dump. I'll try a bigger
version now.
Switching to the CosineDistanceMeasure has improved things. Sorry
for the thread highjack.
All the tests pass with the changes, though, and I had the same
problem before.
The triangle inequality trick should help by a factor of two or
more as
well.