On Mon, Nov 9, 2009 at 4:57 AM, Sean Owen <[email protected]> wrote: > Ted will say, and I again I agree, that Pearson is not usually the > best similarity metric, though it is widely mentioned in collaborative > filtering examples and literature. >
You said it! I don't need to. > What Ted quotes below is implemented in the framework as > LogLikelihoodSimilarity. For that, I believe it *is* the pairs with > the largest resulting similarity score that you do want to keep. Or at > least it is more reasonable. Ted maybe you can check my thinking on > that. > Yes. And you don't even need the score in the end, just the fact that it passed the threshold. I typically weight the pairing by IDF score of the source item. -- Ted Dunning, CTO DeepDyve
