Hi,
Could you please explain about the format of .lm file generated by the
script ngram-count. For example, I got .lm file that starts with:
\data\
ngram 1=76288
ngram 2=1644644
ngram 3=1410926
ngram 4=1393383
ngram 5=1071864
\1-grams:
-2.815075 ! -1.648233
-3.10526 " -0.4596801
-6.09184 # -0.1521228
-4.628769 $ -0.2417951
-3.474399 % -0.7403963
-4.398747 & -0.7879647
-2.462822 ' -0.6111439
If I understand correctly "ngram 1=76288" means that there are 76288 ngrams
containing one token (word), and so on.
But what do the negative numbers before and after the tokens mean ? Also I
noticed that sometimes the numbers after the tokens are missing. What does
it mean ?
Thank you very much,
Michael.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support