Hi, Could you please explain about the format of .lm file generated by the script ngram-count. For example, I got .lm file that starts with:
\data\ ngram 1=76288 ngram 2=1644644 ngram 3=1410926 ngram 4=1393383 ngram 5=1071864 \1-grams: -2.815075 ! -1.648233 -3.10526 " -0.4596801 -6.09184 # -0.1521228 -4.628769 $ -0.2417951 -3.474399 % -0.7403963 -4.398747 & -0.7879647 -2.462822 ' -0.6111439 If I understand correctly "ngram 1=76288" means that there are 76288 ngrams containing one token (word), and so on. But what do the negative numbers before and after the tokens mean ? Also I noticed that sometimes the numbers after the tokens are missing. What does it mean ? Thank you very much, Michael.
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support