Dear Ted, Thank you very much for your answer. I know that my question is not easy to answer. I have been analysing the differences between scores and measures for months, but it's so difficult establish a parameter or patron to choose the best measure and score.
At the moment, Left measure is the best to rank bi-grams, as you said in the FAQ document. Well, I continue to thinking about it! Best regards, Mercè --- In ngram@yahoogroups.com, Ted Pedersen <duluth...@...> wrote: > > Greetings Merce, > > Our FAQ tries to provide a little guidance on this issue... > > http://search.cpan.org/dist/Text-NSP/doc/FAQ.pod > > The short answer though is that there probably isn't a single measure > that is always the "best" choice. Worse yet, in general there are not > any clear "cutoffs" for any of the measures as to where you find a > boundary between meaningful associations and spurious ones. Even when > using p-scores (in Fisher's Exact test) you can set cutoffs of .01 .05 > .1 .001 .005 and so on with equal validity.... > > So, unfortunately, there is usually a bit of trial and error involved. > Some of the measure's scores are sensitive to sample size, and so even > if you find a nice cutoff for one sample of data, you might not want > to use that for another sample of data (if it is larger or smaller). > > I wish I had clearer guidance to offer, but generally speaking I don't > think there are obvious answers to your question. (I would love to > learn I was wrong about this though, so if anyone has advice please do > come forward!) > > Cordially, > Ted > > On Wed, Apr 15, 2009 at 10:36 AM, mercevg <merc...@...> wrote: > > > > > > Dear all, > > > > I would like to know how to select the best score for each n-gram. At the > > moment, I have my count bi-grams list filtered by the statistical measures. > > I give us some examples: > > > > TMI > > earth<>station<>1 0.0205 1375 2249 2598 > > signal<>unit<>5 0.0102 958 5446 1900 > > > > Left > > earth<>station<>1 1.0000 1375 2249 2598 > > signal<>unit<>1 1.0000 958 5446 1900 > > > > Tscore > > earth<>station<>1 36.7029 1375 2249 2598 > > signal<>unit<>2 30.1494 958 5446 1900 > > > > How can I distinguish the best score between these three measures for each > > bi-gram? Or, in these case, maybe I have to consider just the rank value and > > not the score value to choose a collocation. > > > > Best regards, > > Mercè > > > > > > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse >