On Jun 24, 2009, at 6:53 PM, Ted Dunning wrote:
Grant,
This optimization should have made a large difference. Did it?
Yes. Still quantifying, but very promising. Still having a hard time
finding good t1, t2 values for the simple tests I am running
clustering Wikipedia data, so that is clouding things. It seems no
matter what I pick, I get one vector per canopy. Obviously, something
is wrong, but I don't know what. Sigh. Of course, it could be the
fact the docs I'm clustering aren't related, I guess. I'm only doing
the first 1000 from a dump. I'll try a bigger version now.
All the tests pass with the changes, though, and I had the same
problem before.
The triangle inequality trick should help by a factor of two or more
as
well.