Greetings Merce, Our FAQ tries to provide a little guidance on this issue...
http://search.cpan.org/dist/Text-NSP/doc/FAQ.pod The short answer though is that there probably isn't a single measure that is always the "best" choice. Worse yet, in general there are not any clear "cutoffs" for any of the measures as to where you find a boundary between meaningful associations and spurious ones. Even when using p-scores (in Fisher's Exact test) you can set cutoffs of .01 .05 .1 .001 .005 and so on with equal validity.... So, unfortunately, there is usually a bit of trial and error involved. Some of the measure's scores are sensitive to sample size, and so even if you find a nice cutoff for one sample of data, you might not want to use that for another sample of data (if it is larger or smaller). I wish I had clearer guidance to offer, but generally speaking I don't think there are obvious answers to your question. (I would love to learn I was wrong about this though, so if anyone has advice please do come forward!) Cordially, Ted On Wed, Apr 15, 2009 at 10:36 AM, mercevg <merc...@yahoo.es> wrote: > > > Dear all, > > I would like to know how to select the best score for each n-gram. At the > moment, I have my count bi-grams list filtered by the statistical measures. > I give us some examples: > > TMI > earth<>station<>1 0.0205 1375 2249 2598 > signal<>unit<>5 0.0102 958 5446 1900 > > Left > earth<>station<>1 1.0000 1375 2249 2598 > signal<>unit<>1 1.0000 958 5446 1900 > > Tscore > earth<>station<>1 36.7029 1375 2249 2598 > signal<>unit<>2 30.1494 958 5446 1900 > > How can I distinguish the best score between these three measures for each > bi-gram? Or, in these case, maybe I have to consider just the rank value and > not the score value to choose a collocation. > > Best regards, > Mercè > > -- Ted Pedersen http://www.d.umn.edu/~tpederse