Thanks for your question Ronojoy. At present statistic.pl only supports the test of independence for the 3-d log-likelihood measure, and doesn't support the kinds of tests that you are interested in doing. To be clear, as you seemed to suspect, it only supports :
p(w1,w2,w3) = p(w1)*p(w2)*p(w3) Note that count.pl does take advantage of the set_freq_combo values, so it is possible to considerably improve the performance of counting for longer n-grams by only retaining those marginal totals that you are interested in (via appropriate setting of set_freq_combo). Sorry about that - unfortunately at this point I don't think it's likely we'll be adding this functionality any time soon, although we'd be happy for anyone who is interested to take this on as a project. :) Cordially, Ted On Sun, Dec 6, 2009 at 2:20 PM, Ted Pedersen <duluth...@gmail.com> wrote: > ---------- Forwarded message ---------- > From: r...@imsc.res.in <r...@imsc.res.in> > Date: Dec 5, 2009 11:59 AM > Subject: Re: nsp/trigram signficance > To: Ted Pedersen <duluth...@gmail.com> > > > May I request you to forward this to list ? I don't use Yahoo and it > seems posting to the mailing list is disallowed without an Yahoo > account. Please could you cc me the reply ? Thanks. > > ------ > > To outline what I did. > > The following steps *work* : > > (1) Corpus of 417 distinct tokens, corpus size is about 10,000 tokens, > in corpus.txt > > (2) count.pl --ngram 3 --set_freq_combo combofile.txt corpus-3.cnt corpus.txt > > (3) statistic.pl --ngram 3 --set_freq_combo combofile.txt ll.pm > corpus-3sig.txt corpus-3.cnt > > where combofile.txt is of form > > 0 1 2 > > 1 > 2 > 0 1 > 0 2 > 1 2 > > The following *does not work* : > > Steps 1 - 3, but with combofile.txt as > > 0 1 2 > > 1 > 0 1 > > or any variant of the above. I want to test the hypothesis P(w1w2w3) = > P(w1)*P(w2w3) and it appears from the doc that playing around with the > frequency combinations is the way to go about doing this. I couldn't > get it to work, however. > > > > > Quoting Ted Pedersen <duluth...@gmail.com>: > >> Thanks for your query. Could you send me the exact command you >> running? That would be helpful in understanding what is happening. >> Also, if you could send this to the ngram mailing list rather than me >> directly, that would be helpful as I'm sure other users would be >> interested. >> >> http://tech.groups.yahoo.com/group/ngram/ >> >> Thanks! >> Ted >> >> On Sat, Dec 5, 2009 at 5:13 AM, Ronojoy Adhikari <r...@imsc.res.in> wrote: >> >> > >> > Dear Prof. Pedersen, >> > >> > I am a user of your NSP software and I am taking the liberty of bothering >> > you with a query. >> > >> > I have been trying to test for alternative hypotheses for independence of >> > trigrams using the -set_freq_combo flags in count.pl and statistic.pl. >> > While >> > this works fine for count.pl, statistic.pl invariably leads to an error >> > message of the form : >> > >> > Frequency combination "x" missing! >> > >> > where "x" could be any of "0 1 2", "0", "1", "2", or "0 1" and >> > permutations. >> > I have tried every possible combination of these and the only combination >> > which works is when the full set >> > >> > 0 1 2 >> > >> > 1 >> > 2 >> > 0 1 >> > 0 2 >> > 1 2 >> > >> > is specified in the -set_freq_combo file. Am I doing something wrong or >> > does >> > NSP only do the default trigram hypothesis test of >> > P(w1w2w3)=P(w1)*P(w2)*P(w3) ? Your help would be much appreciated. >> > >> > Thanks in advance, >> > >> > Ronojoy Adhikari. >> > >> > ________________________________________________________________________________ >> > Dr. Ronojoy Adhikari >> > The Institute of Mathematical Sciences Tel: +91(44)2254 3253 >> > Chennai 600113 India Fax: +91(44)2254 1586 >> > email:r...@imsc.res.in URL: http://www.imsc.res.in/~rjoy >> > ________________________________________________________________________________ >> > >> > >> > >> >> >> >> -- >> Ted Pedersen >> http://www.d.umn.edu/~tpederse >> >> > > > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse > -- Ted Pedersen http://www.d.umn.edu/~tpederse