In fact this bug is a little different than first thought. It turns out that my files were not formatted correctly. :) I had forgotten the ngram count in the first line of the file :
marimba(6): perl rank.pl a b Rank correlation coefficient = 0.5000 marimba(7): cat a 4 first<>bigram<>1 4.000 1 1 second<>bigram<>2 3.000 2 2 extra<>bigram1<>3 2.000 3 3 third<>bigram<>4 1.000 4 4 marimba(8): cat b 4 second<>bigram<>1 4.000 2 2 extra<>bigram2<>2 3.000 4 4 first<>bigram<>3 2.000 1 1 third<>bigram<>4 1.000 3 3 So, if your file is formatted correctly, this should not be a problem. But, we should recover a bit more gracefully from this kind of input error, so we'll work on that! Thanks! Ted On Thu, Feb 14, 2013 at 10:16 AM, Ted Pedersen <tpede...@d.umn.edu> wrote: > A user reports a bug in rank.pl. This seems to occur when dealing with > smaller files, for example... > > marimba(49): more x > first<>bigram<>1 4.000 1 1 > second<>bigram<>2 3.000 2 2 > extra<>bigram1<>3 2.000 3 3 > third<>bigram<>4 1.000 4 4 > > marimba(50): more y > second<>bigram<>1 4.000 2 2 > extra<>bigram2<>2 3.000 4 4 > first<>bigram<>3 2.000 1 1 > third<>bigram<>4 1.000 3 3 > > > New version (0.03) > marimba(51): rank.pl x y > Illegal division by zero at /usr/local/bin/rank.pl line 397. > > Old version (0.01) > marimba(52): perl ./rank.pl x y > Rank correlation coefficient = 0.5000 > > There are also cases there rank.pl will report, falsely, that there > are no ngrams in common between the input files. Again, this seems to > occur with smaller files. > > We are checking into this, and if you've observed anything similar > please do let us know! > > Cordially, > Ted