On Mon, Jul 18, 2011 at 11:28 PM, Jake Mannix <[email protected]> wrote: > Yeah, I guess I see that. Which similarity measures require all this > extra baggage?
They all do in some form. For example, log-likelihood is based on counts. Cosine measure would need a bunch of products. Euclidean distance would need to save a bunch of differences. So none of those are calculated early; they're calculated later from the original 'raw data'.
