To clarify, the optimizations I know about that are pending are: 1) better hash table that doesn't box/unbox (clear win)
2) better centroid distance computation that uses sparsity of document vector to minimize the L2 norm computation time (this is what I am curious about) 3) use triangle inequality to limit number of times we have to compute distance (probably 2x win, but I am curious) I realized that my question was ambiguous after I asked. On Wed, Jun 24, 2009 at 4:20 PM, Grant Ingersoll <[email protected]>wrote: > Still quantifying, but very promising -- Ted Dunning, CTO DeepDyve
