Hi, My preferred way to build large LMs has been IRSTLM as it can handle large corpora nicely by splitting the task. The produced binary LMs work well with Moses. Then I decided to try the new and shiny KenLM. However, when converting the result to KenLM format, the converted LM gives a lot worse BLEU score.
I am building the LMs in this way: 1) build-lm.sh - build iARPA LM 2) prune-lm 3) compile-lm - convert iARPA to binary IRSTLM (gives me 0.3346 BLEU) 4) compile-lm --text=yes - convert iARPA to ARPA 5) build_binary trie - convert ARPA to KenLM (gives me 0.2543 BLEU) The moses.ini for both cases is the same, differing only in the LM line (1 0 5 "/mnt/smt/lm/mt4-lv-lcase/lm-pruned.blm" vs "8 0 5 /mnt/smt/lm/mt4-lv-lcase/lm-pruned.mmap"). I have observed this in more than one case. I don't know who blame - either the conversion from iARPA to ARPA or KenLM. What is your best practice for estimating large LMs to be converted to KenLM? Is it SRILM? -- Karlis _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
