The score that is called rootLogLikelihood in the Mahout code is what you
are looking for.

Since LLR is chi^2 distributed (asymptotically, without dependence, blah,
blah) the square root is distributed like the absolute value of a normally
distributed random variable.  If you attach a sign according to whether
k_11 is larger or smaller than what you expect from k_12, k21, and k_22,
then you get a really handy normally distributed score.

For the purposes were normally talk about, you want to look for a large,
positive value of this score.  Since it is already normally distributed,
you can use terms like 3 sigma and people kind of sort of already
understand the scale (6 sigma is really big, 1 sigma not so much, 18 sigma
is outrageously large).




On Wed, Aug 13, 2014 at 4:42 PM, Dmitriy Lyubimov <[email protected]> wrote:

> Hello,
>
> i would be greatful for a hint for a following problem here in cooccurrence
> analysis. It may be not most practical one but it appeared in the test.
>
> The problem is that LLR tests for independence. As such, it would give high
> scores for negatively correlated events too. E.g.  say countA = 91,
> countB=91, countA&B=1, total = 213 produces sky-high llr of 139.33.
> However, in this situations these events avoid each other (something we are
> not looking for) rather than highly likely to co-occur (somethng we are
> looking for).
>
> Is there a quick test to filter out negatively co-occuring events?
>
> thanks.
> -d
>

Reply via email to