Thanks for your question Ronojoy.

At present statistic.pl only supports the test of independence for the
3-d log-likelihood measure, and doesn't support the kinds of tests
that you are interested in doing. To be clear, as you seemed to
suspect, it only supports :

p(w1,w2,w3) = p(w1)*p(w2)*p(w3)

Note that count.pl does take advantage of the set_freq_combo values,
so it is possible to considerably improve the performance of counting
for longer n-grams by only retaining those marginal totals that you
are interested in (via appropriate setting of set_freq_combo).

Sorry about that - unfortunately at this point I don't think it's
likely we'll be adding this functionality any time soon, although we'd
be happy for anyone who is interested to take this on as a project. :)

Cordially,
Ted

On Sun, Dec 6, 2009 at 2:20 PM, Ted Pedersen <duluth...@gmail.com> wrote:
> ---------- Forwarded message ----------
> From: r...@imsc.res.in <r...@imsc.res.in>
> Date: Dec 5, 2009 11:59 AM
> Subject: Re: nsp/trigram signficance
> To: Ted Pedersen <duluth...@gmail.com>
>
>
> May I request you to forward this to list ? I don't use Yahoo and it
> seems posting to the mailing list is disallowed without an Yahoo
> account. Please could you cc me the reply ? Thanks.
>
> ------
>
> To outline what I did.
>
> The following steps *work* :
>
> (1) Corpus of 417 distinct tokens, corpus size is about 10,000 tokens,
> in corpus.txt
>
> (2) count.pl --ngram 3 --set_freq_combo combofile.txt corpus-3.cnt corpus.txt
>
> (3) statistic.pl --ngram 3 --set_freq_combo combofile.txt ll.pm
> corpus-3sig.txt corpus-3.cnt
>
> where combofile.txt is of form
>
> 0 1 2
>
> 1
> 2
> 0 1
> 0 2
> 1 2
>
> The following *does not work* :
>
> Steps 1 - 3, but with combofile.txt as
>
> 0 1 2
>
> 1
> 0 1
>
> or any variant of the above. I want to test the hypothesis P(w1w2w3) =
> P(w1)*P(w2w3) and it appears from the doc that playing around with the
> frequency combinations is the way to go about doing this. I couldn't
> get it to work, however.
>
>
>
>
> Quoting Ted Pedersen <duluth...@gmail.com>:
>
>> Thanks for your query. Could you send me the exact command you
>> running? That would be helpful in understanding what is happening.
>> Also, if you could send this to the ngram mailing list rather than me
>> directly, that would be helpful as I'm sure other users would be
>> interested.
>>
>> http://tech.groups.yahoo.com/group/ngram/
>>
>> Thanks!
>> Ted
>>
>> On Sat, Dec 5, 2009 at 5:13 AM, Ronojoy Adhikari <r...@imsc.res.in> wrote:
>>
>> >
>> > Dear Prof. Pedersen,
>> >
>> > I am a user of your NSP software and I am taking the liberty of bothering
>> > you with a query.
>> >
>> > I have been trying to test for alternative hypotheses for independence of
>> > trigrams using the -set_freq_combo flags in count.pl and statistic.pl. 
>> > While
>> > this works fine for count.pl, statistic.pl invariably leads to an error
>> > message of the form :
>> >
>> > Frequency combination "x" missing!
>> >
>> > where "x" could be any of "0 1 2", "0", "1", "2", or "0 1" and 
>> > permutations.
>> > I have tried every possible combination of these and the only combination
>> > which works is when the full set
>> >
>> > 0 1 2
>> >
>> > 1
>> > 2
>> > 0 1
>> > 0 2
>> > 1 2
>> >
>> > is specified in the -set_freq_combo file. Am I doing something wrong or 
>> > does
>> > NSP only do the default trigram hypothesis test of
>> > P(w1w2w3)=P(w1)*P(w2)*P(w3) ? Your help would be much appreciated.
>> >
>> > Thanks in advance,
>> >
>> > Ronojoy Adhikari.
>> >
>> > ________________________________________________________________________________
>> > Dr. Ronojoy Adhikari
>> > The Institute of Mathematical Sciences  Tel: +91(44)2254 3253
>> > Chennai 600113 India                    Fax: +91(44)2254 1586
>> > email:r...@imsc.res.in                  URL: http://www.imsc.res.in/~rjoy
>> > ________________________________________________________________________________
>> >
>> >
>> >
>>
>>
>>
>> --
>> Ted Pedersen
>> http://www.d.umn.edu/~tpederse
>>
>>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>
>
> --
> Ted Pedersen
> http://www.d.umn.edu/~tpederse
>



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

Reply via email to