The score that is called rootLogLikelihood in the Mahout code is what you are looking for.
Since LLR is chi^2 distributed (asymptotically, without dependence, blah, blah) the square root is distributed like the absolute value of a normally distributed random variable. If you attach a sign according to whether k_11 is larger or smaller than what you expect from k_12, k21, and k_22, then you get a really handy normally distributed score. For the purposes were normally talk about, you want to look for a large, positive value of this score. Since it is already normally distributed, you can use terms like 3 sigma and people kind of sort of already understand the scale (6 sigma is really big, 1 sigma not so much, 18 sigma is outrageously large). On Wed, Aug 13, 2014 at 4:42 PM, Dmitriy Lyubimov <[email protected]> wrote: > Hello, > > i would be greatful for a hint for a following problem here in cooccurrence > analysis. It may be not most practical one but it appeared in the test. > > The problem is that LLR tests for independence. As such, it would give high > scores for negatively correlated events too. E.g. say countA = 91, > countB=91, countA&B=1, total = 213 produces sky-high llr of 139.33. > However, in this situations these events avoid each other (something we are > not looking for) rather than highly likely to co-occur (somethng we are > looking for). > > Is there a quick test to filter out negatively co-occuring events? > > thanks. > -d >
