Re: LLR Scoring question

Shashikant Kore Wed, 13 Jan 2010 09:01:10 -0800

Ted,

Thank you for the tip.


>
>    rootLLR = signum(k11/k1* - k21/k2*) * sqrt(LLR)
>

I didn't get what k1* and k2* are. I used (k11+k12) and (k21+k22) in
the denominator. That gives correct result.

--shashi

On Wed, Jan 13, 2010 at 12:50 AM, Ted Dunning <[email protected]> wrote:
> Raw LLR has a large value whenever there is an anomaly.  In this case, term2
> is rare in the cluster and common outside and is thus an anomaly.
>
> One thing that I do is to use a variant of the LLR score:
>
>    rootLLR = signum(k11/k1* - k21/k2*) * sqrt(LLR)
>


> This score has two advantages over the basic LLR:
>
> a) it is positive where k11 is bigger than expected, negative where it is
> lower.  This resolves your current problem.
>
> b) if there is no difference it is asymptotically normally distributed.
> This allows people to talk about "number of standard deviations" which is a
> more common frame of reference than the chi^2 distribution.
>
>
> On Tue, Jan 12, 2010 at 4:49 AM, Shashikant Kore <[email protected]>wrote:
>
>> As I can see Term1 is rarer outside the cluster, but common in the
>> cluster (relatively speaking.) But, when I calculate LLR scores,
>> Term1's score (3569) is lower than that of Term2 (3622). This looks
>> counter-intuitive to me. Is it the case that LLR score is higher if
>> term is common outside the cluster and rare inside?  Can this be
>> "fixed"?
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: LLR Scoring question

Reply via email to