That is very helpful, Ted. Thanks so much. There seems to be a fair number of papers in the world of psycholinguistics that use the term MI when they mean PMI.
Thanks for the pointer to the thesis. I really appreciate it. Yours, Cyrus On Tue, Jun 7, 2011 at 6:42 PM, Ted Pedersen <tpede...@d.umn.edu> wrote: > Hi Cyrus, > > There's nothing wrong with your formulation, although I would refer to > what you describe as Pointwise Mutual Information (PMI), since it > seems like it would only compute the probability of observing A B C D > all together and separately, and not include probabilities of A (not > B) C D, and so forth. If you were doing that, then you'd be more in > the realm of Mutual Information (or tmi as we call it). > > Note that NSP does include a 3d version of PMI that essentially > follows your definition. > > http://search.cpan.org/dist/Text-NSP/lib/Text/NSP/Measures/3D/MI/pmi.pm > > Extending to 4-d would not be difficult. > > If on the other hand you would like to do Mutual Information, remember > that only differs from the Log Likelihood Ratio by a constant term, so > you could use our 4-d ll measure for that... > > http://search.cpan.org/dist/Text-NSP/lib/Text/NSP/Measures/4D/MI/ll.pm > > Also, some of the background for these trigram and 4gram measures is > described in Bridget McInnes' MS thesis... > > Extending the Log-Likelihood Ratio to Improve Collocation > Identification (McInnes) - Master of Science Thesis, Department of > Computer Science, University of Minnesota, Duluth, December, 2004. > http://www.d.umn.edu/~tpederse/Pubs/bridget-thesis.pdf > > There are some additional subtleties when you move beyond bigrams, and > that's because rather than simply comparing the occurance of an ngram > to the model of independence (ie P(A,B)/P(A)(B)) you have the option > of comparing to other models (ie P(A,B,C)/P(A,B)P(C)) This becomes > it's own big complicated issue which I won't go into much here, but it > does open up a lot of interesting possibilities for longer ngrams that > you don't have with bigrams. Some of this is discussed in more detail > in Bridget's thesis. > > I hope this helps, and please do let us know if you have any > additional questions, observations or ideas. > > Good luck, > Ted > > On Tue, Jun 7, 2011 at 5:26 PM, Cyrus Shaoul <cyrus.sha...@ualberta.ca> wrote: >> >> >> >> Hi everyone, >> >> My apologies if this has been asked many times before, but >> >> would this be an appropriate way to calculated the Mutual Information for a >> 4-gram made of of words A B C and D? >> >> Mi(ABCD) = log(P(ABCD) / (P (A) x P (B) x P (C) x P (D))) >> >> If not, what is a better way? Why is this bad? >> >> Thanks for your help, >> >> Cyrus >> >> > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse > > > ------------------------------------ > > Yahoo! Groups Links > > > > -- http://www.ualberta.ca/~cshaoul/