I think the answer is that this is a different beast. It is a fully
distributed computation, and doesn't have the row
Vectors themselves together at the same time. (That would be much more
expensive to output -- the cross product of all rows with themselves.) So
those other measure implementations can't be applied -- or rather, there's a
more efficient way of computing all-pairs similarity here.

You need all cooccurrences since some implementations need that value, and
you're computing all-pairs. (I'm sure you can hack away the cooccurrence
computation if you know your metric doesn't use it.)

There are several levers you can pull, including one like Ted mentions --
maxSimilaritiesPerRow.

On Thu, Jul 14, 2011 at 6:17 PM, Grant Ingersoll <[email protected]>wrote:
>
> Any thoughts on why not reuse our existing Distance measures?  Seems like
> once you know that two vectors have something in common, there isn't much
> point in calculating all the co-occurrences, just save of those two (or
> whatever) and then later call the distance measure on the vectors.
>
>

Reply via email to