Question about log-likelihood formulation

Sean Owen Sat, 29 Jan 2011 06:43:42 -0800

In LogLikelihoodSimilarity, you'll find a function safeLog() which returns
0.0, rather than NaN, when the log of a non-positive number is computed.


It creates an asymmetry in corner cases. For example, imagine we have 5
users. All 5 are associated to item A; all but one are associated to item B.
The similarity between 1 and 2 is 0.0, but the similarity between 2 and 1 is
NaN.

Taking off this safety feature makes both NaN. I think it's neither more or
less theoretically defensible to go either way. But in practice, it's
slightly bad as it means no similarity is available in some edge cases.

My intuition says we should be wary of edge cases -- you can get funny
results. So part of me thinks it's good to turn these into NaN and ignore
them. They are after all 0.0 similar at the moment.

Is my intuition roughly right?

Question about log-likelihood formulation

Reply via email to