[ngram] Re: Some questions about Text-NSP

Ted Pedersen tpede...@d.umn.edu [ngram] Sun, 25 Nov 2018 06:58:11 -0800

Hi Blk,

Thanks for pointing these out. On the Poisson Stirling measure, I
think the reason we haven't included log n is that log n would simply
be a constant (log of the total number of bigrams) and so would not
change the rankings that we get from these scores. That said, if you
were comparing scores across different sized corpora then the
denominator would likely be important to include.


Thanks for pointing out the typos. Text-NSP is right now in a fairly
dormant state, but I do have a list of small changes to make and will
add yours to these.

Thanks for your interest, and please let us know if you have any other
questions.

Cordially,
Ted
---
Ted Pedersen
http://www.d.umn.edu/~tpederse

On Sun, Nov 25, 2018 at 4:13 AM BLK Serene <blkser...@gmail.com> wrote:
>
> Hi, I have some questions about the association measures implemented in 
> Text-NSP:
>
> The Poisson-Sterling Measure given in the documentation is:
> Poisson-Stirling = n11 * ( log(n11) - log(m11) - 1)
>
> But in Quasthoff's paper the formulae given by the author is:
> sig(A, B) = (k * (log k - log λ - 1)) / log n
>
> I'm a little confused since I know little about math or statistics. Why is 
> the denominator omitted here?
>
> And some typos in the doc:
> square of phi coefficient:
> PHI^2 = ((n11 * n22) - (n21 * n21))^2/(n1p * np1 * np2 * n2p)
> where n21 *n21 should be n12 * n21
>
> chi-squared test:
> Pearson's Chi-squred test measures the devitation (should be deviation) 
> between
>
> Pearson's Chi-Squared = 2 * [((n11 - m11)/m11)^2 + ((n12 - m12)/m12)^2 +
>                              ((n21 - m21)/m21)^2 + ((n22 -m22)/m22)^2]
> should be: ((n11 - m11)/m11)^2 + ((n12 - m12)/m12)^2 +
>                    ((n21 - m21)/m21)^2 + ((n22 -m22)/m22)^2
>
> And chi2: same as above.
>
> Thanks in advance.

[ngram] Re: Some questions about Text-NSP

Reply via email to