---------- Forwarded message ---------- From: r...@imsc.res.in <r...@imsc.res.in> Date: Dec 5, 2009 11:59 AM Subject: Re: nsp/trigram signficance To: Ted Pedersen <duluth...@gmail.com>
May I request you to forward this to list ? I don't use Yahoo and it seems posting to the mailing list is disallowed without an Yahoo account. Please could you cc me the reply ? Thanks. ------ To outline what I did. The following steps *work* : (1) Corpus of 417 distinct tokens, corpus size is about 10,000 tokens, in corpus.txt (2) count.pl --ngram 3 --set_freq_combo combofile.txt corpus-3.cnt corpus.txt (3) statistic.pl --ngram 3 --set_freq_combo combofile.txt ll.pm corpus-3sig.txt corpus-3.cnt where combofile.txt is of form 0 1 2 1 2 0 1 0 2 1 2 The following *does not work* : Steps 1 - 3, but with combofile.txt as 0 1 2 1 0 1 or any variant of the above. I want to test the hypothesis P(w1w2w3) = P(w1)*P(w2w3) and it appears from the doc that playing around with the frequency combinations is the way to go about doing this. I couldn't get it to work, however. Quoting Ted Pedersen <duluth...@gmail.com>: > Thanks for your query. Could you send me the exact command you > running? That would be helpful in understanding what is happening. > Also, if you could send this to the ngram mailing list rather than me > directly, that would be helpful as I'm sure other users would be > interested. > > http://tech.groups.yahoo.com/group/ngram/ > > Thanks! > Ted > > On Sat, Dec 5, 2009 at 5:13 AM, Ronojoy Adhikari <r...@imsc.res.in> wrote: > > > > > Dear Prof. Pedersen, > > > > I am a user of your NSP software and I am taking the liberty of bothering > > you with a query. > > > > I have been trying to test for alternative hypotheses for independence of > > trigrams using the -set_freq_combo flags in count.pl and statistic.pl. While > > this works fine for count.pl, statistic.pl invariably leads to an error > > message of the form : > > > > Frequency combination "x" missing! > > > > where "x" could be any of "0 1 2", "0", "1", "2", or "0 1" and permutations. > > I have tried every possible combination of these and the only combination > > which works is when the full set > > > > 0 1 2 > > > > 1 > > 2 > > 0 1 > > 0 2 > > 1 2 > > > > is specified in the -set_freq_combo file. Am I doing something wrong or does > > NSP only do the default trigram hypothesis test of > > P(w1w2w3)=P(w1)*P(w2)*P(w3) ? Your help would be much appreciated. > > > > Thanks in advance, > > > > Ronojoy Adhikari. > > > > ________________________________________________________________________________ > > Dr. Ronojoy Adhikari > > The Institute of Mathematical Sciences Tel: +91(44)2254 3253 > > Chennai 600113 India Fax: +91(44)2254 1586 > > email:r...@imsc.res.in URL: http://www.imsc.res.in/~rjoy > > ________________________________________________________________________________ > > > > > > > > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse > > ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. -- Ted Pedersen http://www.d.umn.edu/~tpederse