I dont have my code here to verify the result. Can you show the calculation
here i mean the values of the log etc. Maybe will give a better idea


On Tue, Jan 12, 2010 at 6:19 PM, Shashikant Kore <[email protected]>wrote:

> Hi,
>
> I am looking at LLR scores for two terms in a cluster which seem
> non-intuitive to me.
>
> The corpus size is 706,120 and size of the cluster is 21964.
>
> Term1 appears in 904 docs  in the cluster and  1144 docs outside the
> cluster.
> Term2 appears in 36 docs  in the cluster and 60280 docs outside the
> cluster.
>
> As I can see Term1 is rarer outside the cluster, but common in the
> cluster (relatively speaking.) But, when I calculate LLR scores,
> Term1's score (3569) is lower than that of Term2 (3622). This looks
> counter-intuitive to me. Is it the case that LLR score is higher if
> term is common outside the cluster and rare inside?  Can this be
> "fixed"?
>
> The k11, k12, k21,k22 values for Term1 and Term2 are as follows if you
> wish to calculate.
>
> Term1
> k11     904
> k12     21060
> k21     1144
> k22     683012
>
> Term2
> k11     36
> k12     21928
> k21     60280
> k22     623876
>
> Thanks,
>
> --shashi
>

Reply via email to