---------- Forwarded message ----------
From: r...@imsc.res.in <r...@imsc.res.in>
Date: Dec 5, 2009 11:59 AM
Subject: Re: nsp/trigram signficance
To: Ted Pedersen <duluth...@gmail.com>


May I request you to forward this to list ? I don't use Yahoo and it
seems posting to the mailing list is disallowed without an Yahoo
account. Please could you cc me the reply ? Thanks.

------

To outline what I did.

The following steps *work* :

(1) Corpus of 417 distinct tokens, corpus size is about 10,000 tokens,
in corpus.txt

(2) count.pl --ngram 3 --set_freq_combo combofile.txt corpus-3.cnt corpus.txt

(3) statistic.pl --ngram 3 --set_freq_combo combofile.txt ll.pm
corpus-3sig.txt corpus-3.cnt

where combofile.txt is of form

0 1 2

1
2
0 1
0 2
1 2

The following *does not work* :

Steps 1 - 3, but with combofile.txt as

0 1 2

1
0 1

or any variant of the above. I want to test the hypothesis P(w1w2w3) =
P(w1)*P(w2w3) and it appears from the doc that playing around with the
frequency combinations is the way to go about doing this. I couldn't
get it to work, however.




Quoting Ted Pedersen <duluth...@gmail.com>:

> Thanks for your query. Could you send me the exact command you
> running? That would be helpful in understanding what is happening.
> Also, if you could send this to the ngram mailing list rather than me
> directly, that would be helpful as I'm sure other users would be
> interested.
>
> http://tech.groups.yahoo.com/group/ngram/
>
> Thanks!
> Ted
>
> On Sat, Dec 5, 2009 at 5:13 AM, Ronojoy Adhikari <r...@imsc.res.in> wrote:
>
> >
> > Dear Prof. Pedersen,
> >
> > I am a user of your NSP software and I am taking the liberty of bothering
> > you with a query.
> >
> > I have been trying to test for alternative hypotheses for independence of
> > trigrams using the -set_freq_combo flags in count.pl and statistic.pl. While
> > this works fine for count.pl, statistic.pl invariably leads to an error
> > message of the form :
> >
> > Frequency combination "x" missing!
> >
> > where "x" could be any of "0 1 2", "0", "1", "2", or "0 1" and permutations.
> > I have tried every possible combination of these and the only combination
> > which works is when the full set
> >
> > 0 1 2
> >
> > 1
> > 2
> > 0 1
> > 0 2
> > 1 2
> >
> > is specified in the -set_freq_combo file. Am I doing something wrong or does
> > NSP only do the default trigram hypothesis test of
> > P(w1w2w3)=P(w1)*P(w2)*P(w3) ? Your help would be much appreciated.
> >
> > Thanks in advance,
> >
> > Ronojoy Adhikari.
> >
> > ________________________________________________________________________________
> > Dr. Ronojoy Adhikari
> > The Institute of Mathematical Sciences  Tel: +91(44)2254 3253
> > Chennai 600113 India                    Fax: +91(44)2254 1586
> > email:r...@imsc.res.in                  URL: http://www.imsc.res.in/~rjoy
> > ________________________________________________________________________________
> >
> >
> >
>
>
>
> --
> Ted Pedersen
> http://www.d.umn.edu/~tpederse
>
>



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

Reply via email to