Hi Karin,

I think the problem you are having is due to the fact that you have
different number of word pairs in each list, and the fact that most of the
word pairs are unique to each list. In general rank.pl expects that the two
input files be made up of the same pairs of words (just ranked differently
by a different measure of association, for example). When that isn't the
case, the program will eliminate any word pairs that aren't in both files
and then run. So, I think this combination of issues is causing rank.pl to
return this very unexpected value.

My guess is that it's the fact that the number of input pairs is different
in each file, but I will do a little more checking in the next day or two
to really see for sure. Here's a link to the rank.pl documentation that
describes how this particular case is intended to be handled....

http://search.cpan.org/dist/Text-NSP/bin/utils/rank.pl#1.4._Dealing_with_Dissimilar_Lists_of_N-grams
:

More soon,
Ted


On Tue, Feb 5, 2013 at 10:06 AM, Ted Pedersen <tpede...@d.umn.edu> wrote:

> **
>
>  [Attachment(s) <#13cab1d2ea9de4f7_TopText> from Ted Pedersen included
> below]
>
> ---------- Forwarded message ----------
> From: Karin Cavallin karin.caval...@ling.gu.se>
> Date: Tue, Feb 5, 2013 at 8:53 AM
> Subject: -1.1000(sic!) as result from rank.pl
> To: "tpede...@umn.edu" tpede...@umn.edu>
>
> Dear professor Ted
>
> I didn't know whom to report this error to, so I hope you can forward
> this to the appropriate receiver.
>
> I have been using the NSP for a while, especially the bigram packages.
> I'm working with lexical sets verbal predicate and nominal objects,
> and to collocational analysis on them.
> I wanted to compare the ranking between sets coming from different
> corpora. (I know it is quite uninteresting to do ranking on such
> different data, but I am trying different things for my thesis.)
>
> Today I noticed one lexical set to be -1.1000, this should not be
> possible! (I have only noticed this one time)
>
> karin$ rank.pl 65_anstr.txt 95_anstr.txt
> Rank correlation coefficient = -1.1000
>
> I attached the files which I get this weird outcome from.
>
> Best regards
> /karin
>
> Karin Cavallin
> PhD Student in Computational Linguistics
> University of Gothenburg, Sweden
>
> sky<>ansträngning<>505 25.1952 2 5 15
> fördubbla<>ansträngning<>1582 10.8890 1 5 15
> koncentrera<>ansträngning<>1912 9.1951 1 11 15
> krävas<>ansträngning<>2172 8.2948 1 17 15
> underlätta<>ansträngning<>2172 8.2948 1 17 15
> märka<>ansträngning<>2471 7.4301 1 26 15
> göra<>ansträngning<>2915 6.3704 3 1323 15
> fortsätta<>ansträngning<>3097 6.0043 1 53 15
> kosta<>ansträngning<>3723 4.8170 1 97 15
> och<>ansträngning<>4162 4.0424 1 145 15
> sätta<>ansträngning<>4482 3.4540 1 198 15
> lägga<>ansträngning<>4745 3.0005 1 253 15
>
> intensifiera<>ansträngning<>3951 40.5247 3 22 33
> göra<>ansträngning<>4665 35.6553 12 20089 33
> fortsätta<>ansträngning<>8254 21.8238 3 468 33
> kräva<>ansträngning<>10206 17.4829 3 973 33
> trotsa<>ansträngning<>17176 9.9897 1 39 33
> välkomna<>ansträngning<>18254 9.3712 1 53 33
> underlätta<>ansträngning<>20704 8.1388 1 98 33
> döma<>ansträngning<>22762 7.1873 1 158 33
> skada<>ansträngning<>23084 7.0537 1 169 33
> rikta<>ansträngning<>23084 7.0537 1 169 33
> ha<>ansträngning<>23176 7.0134 1 89009 33
> stödja<>ansträngning<>25349 6.1642 1 265 33
> krävas<>ansträngning<>25718 6.0348 1 283 33
> vara<>ansträngning<>29926 4.5609 1 603 33
> leda<>ansträngning<>30789 4.2612 1 705 33
> öka<>ansträngning<>33145 3.4625 1 1076 33
>  
>

Reply via email to