Hi,

Could you please explain about the format of .lm file generated by the
script ngram-count. For example, I got .lm file that starts with:

\data\
ngram 1=76288
ngram 2=1644644
ngram 3=1410926
ngram 4=1393383
ngram 5=1071864

\1-grams:
-2.815075       !       -1.648233
-3.10526        "       -0.4596801
-6.09184        #       -0.1521228
-4.628769       $       -0.2417951
-3.474399       %       -0.7403963
-4.398747       &       -0.7879647
-2.462822       '       -0.6111439

If I understand correctly "ngram 1=76288" means that there are 76288 ngrams
containing one token (word), and so on.
But what do the negative numbers before and after the tokens mean ? Also I
noticed that sometimes the numbers after the tokens are missing. What does
it mean ?

Thank you very much,
     Michael.
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to