To clarify, the optimizations I know about that are pending are:

1) better hash table that doesn't box/unbox (clear win)

2) better centroid distance computation that uses sparsity of document
vector to minimize the L2 norm computation time (this is what I am curious
about)

3) use triangle inequality to limit number of times we have to compute
distance (probably 2x win, but I am curious)

I realized that my question was ambiguous after I asked.

On Wed, Jun 24, 2009 at 4:20 PM, Grant Ingersoll <[email protected]>wrote:

> Still quantifying, but very promising




-- 
Ted Dunning, CTO
DeepDyve

Reply via email to