Hi Karin,

This is very interesting, and I will certainly look into this further and
report back! Thank you for the additional information on this, it does seem
like an interesting case.

More soon!
Ted


On Wed, Feb 6, 2013 at 2:34 AM, Karin Cavallin <karin.caval...@ling.gu.se>wrote:

>  Hi Ted
>
> since I compare 5 different corpora (size-wise and occurrence-wise),
> basically all the sets have different number of pairs. I have run rank.plon 
> on more than 100.000 lexical sets, most of them get no ranking
> co-efficient since there are no co-occurrences between the sets, some of
> them do get a co-efficient ranging from -1.0000 to 1.000, as expected. One
> lexical set gets this -1.1000, the one I sent you.
>
>  So, I don't think it is due to that the sets are too different, but
> something that is beyond me. That's why I though it was important to report
> it to you.
>
>  /karin
>
> Karin Cavallin
> PhD Student in Computational Linguistics
> University of Gothenburg, Sweden
>
>   ------------------------------
> *Från:* duluth...@gmail.com [duluth...@gmail.com] för Ted Pedersen [
> tpede...@d.umn.edu]
> *Skickat:* den 6 februari 2013 03:35
> *Till:* ngram@yahoogroups.com
> *Cc:* Karin Cavallin
> *Ämne:* Re: [ngram] Fwd: -1.1000(sic!) as result from rank.pl [1
> Attachment]
>
>   Hi Karin,
>
>  I think the problem you are having is due to the fact that you have
> different number of word pairs in each list, and the fact that most of the
> word pairs are unique to each list. In general rank.pl expects that the
> two input files be made up of the same pairs of words (just ranked
> differently by a different measure of association, for example). When that
> isn't the case, the program will eliminate any word pairs that aren't in
> both files and then run. So, I think this combination of issues is causing
> rank.pl to return this very unexpected value.
>
>  My guess is that it's the fact that the number of input pairs is
> different in each file, but I will do a little more checking in the next
> day or two to really see for sure. Here's a link to the rank.pldocumentation 
> that describes how this particular case is intended to be
> handled....
>
>
> http://search.cpan.org/dist/Text-NSP/bin/utils/rank.pl#1.4._Dealing_with_Dissimilar_Lists_of_N-grams
> :
>
>  More soon,
> Ted
>
>
> On Tue, Feb 5, 2013 at 10:06 AM, Ted Pedersen <tpede...@d.umn.edu> wrote:
>
>> **
>>
>>  [Attachment(s) <#13caeb04e2f773ea_13cab1d2ea9de4f7_TopText> from Ted
>> Pedersen included below]
>>
>> ---------- Forwarded message ----------
>> From: Karin Cavallin karin.caval...@ling.gu.se>
>> Date: Tue, Feb 5, 2013 at 8:53 AM
>> Subject: -1.1000(sic!) as result from rank.pl
>> To: "tpede...@umn.edu" tpede...@umn.edu>
>>
>> Dear professor Ted
>>
>> I didn't know whom to report this error to, so I hope you can forward
>> this to the appropriate receiver.
>>
>> I have been using the NSP for a while, especially the bigram packages.
>> I'm working with lexical sets verbal predicate and nominal objects,
>> and to collocational analysis on them.
>> I wanted to compare the ranking between sets coming from different
>> corpora. (I know it is quite uninteresting to do ranking on such
>> different data, but I am trying different things for my thesis.)
>>
>> Today I noticed one lexical set to be -1.1000, this should not be
>> possible! (I have only noticed this one time)
>>
>> karin$ rank.pl 65_anstr.txt 95_anstr.txt
>> Rank correlation coefficient = -1.1000
>>
>> I attached the files which I get this weird outcome from.
>>
>> Best regards
>> /karin
>>
>> Karin Cavallin
>> PhD Student in Computational Linguistics
>> University of Gothenburg, Sweden
>>
>> sky<>ansträngning<>505 25.1952 2 5 15
>> fördubbla<>ansträngning<>1582 10.8890 1 5 15
>> koncentrera<>ansträngning<>1912 9.1951 1 11 15
>> krävas<>ansträngning<>2172 8.2948 1 17 15
>> underlätta<>ansträngning<>2172 8.2948 1 17 15
>> märka<>ansträngning<>2471 7.4301 1 26 15
>> göra<>ansträngning<>2915 6.3704 3 1323 15
>> fortsätta<>ansträngning<>3097 6.0043 1 53 15
>> kosta<>ansträngning<>3723 4.8170 1 97 15
>> och<>ansträngning<>4162 4.0424 1 145 15
>> sätta<>ansträngning<>4482 3.4540 1 198 15
>> lägga<>ansträngning<>4745 3.0005 1 253 15
>>
>> intensifiera<>ansträngning<>3951 40.5247 3 22 33
>> göra<>ansträngning<>4665 35.6553 12 20089 33
>> fortsätta<>ansträngning<>8254 21.8238 3 468 33
>> kräva<>ansträngning<>10206 17.4829 3 973 33
>> trotsa<>ansträngning<>17176 9.9897 1 39 33
>> välkomna<>ansträngning<>18254 9.3712 1 53 33
>> underlätta<>ansträngning<>20704 8.1388 1 98 33
>> döma<>ansträngning<>22762 7.1873 1 158 33
>> skada<>ansträngning<>23084 7.0537 1 169 33
>> rikta<>ansträngning<>23084 7.0537 1 169 33
>> ha<>ansträngning<>23176 7.0134 1 89009 33
>> stödja<>ansträngning<>25349 6.1642 1 265 33
>> krävas<>ansträngning<>25718 6.0348 1 283 33
>> vara<>ansträngning<>29926 4.5609 1 603 33
>> leda<>ansträngning<>30789 4.2612 1 705 33
>> öka<>ansträngning<>33145 3.4625 1 1076 33
>>  
>>
>
>

Reply via email to