Dear Ted,

Thank you very much for your answer. I know that my question is not easy to 
answer. I have been analysing the differences between scores and measures for 
months, but it's so difficult establish a parameter or patron to choose the 
best measure and score. 

At the moment, Left measure is the best to rank bi-grams, as you said in the 
FAQ document.

Well, I continue to thinking about it!

Best regards,
Mercè

--- In ngram@yahoogroups.com, Ted Pedersen <duluth...@...> wrote:
>
> Greetings Merce,
> 
> Our FAQ tries to provide a little guidance on this issue...
> 
> http://search.cpan.org/dist/Text-NSP/doc/FAQ.pod
> 
> The short answer though is that there probably isn't a single measure
> that is always the "best" choice. Worse yet, in general there are not
> any clear "cutoffs" for any of the measures as to where you find a
> boundary between meaningful associations and spurious ones. Even when
> using p-scores (in Fisher's Exact test) you can set cutoffs of .01 .05
> .1 .001 .005 and so on with equal validity....
> 
> So, unfortunately, there is usually a bit of trial and error involved.
> Some of the measure's scores are sensitive to sample size, and so even
> if you find a nice cutoff for one sample of data, you might not want
> to use that for another sample of data (if it is larger or smaller).
> 
> I wish I had clearer guidance to offer, but generally speaking I don't
> think there are obvious answers to your question. (I would love to
> learn I was wrong about this though, so if anyone has advice please do
> come forward!)
> 
> Cordially,
> Ted
> 
> On Wed, Apr 15, 2009 at 10:36 AM, mercevg <merc...@...> wrote:
> >
> >
> > Dear all,
> >
> > I would like to know how to select the best score for each n-gram. At the
> > moment, I have my count bi-grams list filtered by the statistical measures.
> > I give us some examples:
> >
> > TMI
> > earth<>station<>1 0.0205 1375 2249 2598
> > signal<>unit<>5 0.0102 958 5446 1900
> >
> > Left
> > earth<>station<>1 1.0000 1375 2249 2598
> > signal<>unit<>1 1.0000 958 5446 1900
> >
> > Tscore
> > earth<>station<>1 36.7029 1375 2249 2598
> > signal<>unit<>2 30.1494 958 5446 1900
> >
> > How can I distinguish the best score between these three measures for each
> > bi-gram? Or, in these case, maybe I have to consider just the rank value and
> > not the score value to choose a collocation.
> >
> > Best regards,
> > Mercè
> >
> > 
> 
> 
> 
> -- 
> Ted Pedersen
> http://www.d.umn.edu/~tpederse
>


Reply via email to