I don't think that any of these need extra baggage.  They all want sums of
products.  In some cases the things being multiplied are 1's so it doesn't
look like multiplication.

They typically want some row or column metric as well.  LLR wants row AND
column sums.  Cosine wants L_2 row norms.

Euclidean distance is really just an adjusted cosine similarity score
because (x-y)^2 = x^2 - 2 x y + y^2.  Thus if you have row norms and cosine,
then you have euclidean distance.

On Mon, Jul 18, 2011 at 3:40 PM, Sean Owen <[email protected]> wrote:

> On Mon, Jul 18, 2011 at 11:28 PM, Jake Mannix <[email protected]>
> wrote:
> > Yeah, I guess I see that.  Which similarity measures require all this
> > extra baggage?
>
> They all do in some form. For example, log-likelihood is based on
> counts. Cosine measure would need a bunch of products. Euclidean
> distance would need to save a bunch of differences. So none of those
> are calculated early; they're calculated later from the original 'raw
> data'.
>

Reply via email to