[Moses-support] Using KENLM

Kārlis Goba Mon, 07 Feb 2011 07:54:35 -0800

Hi,

My preferred way to build large LMs has been IRSTLM as it can handle large 
corpora nicely by splitting the task. The produced binary LMs work well with 
Moses. Then I decided to try the new and shiny KenLM. However, when converting 
the result to KenLM format, the converted LM gives a lot worse BLEU score.


I am building the LMs in this way:

1) build-lm.sh - build iARPA LM
2) prune-lm
3) compile-lm - convert iARPA to binary IRSTLM (gives me 0.3346 BLEU)
4) compile-lm --text=yes - convert iARPA to ARPA
5) build_binary trie - convert ARPA to KenLM (gives me 0.2543 BLEU)

The moses.ini for both cases is the same, differing only in the LM line (1 0 5 
"/mnt/smt/lm/mt4-lv-lcase/lm-pruned.blm" vs "8 0 5 
/mnt/smt/lm/mt4-lv-lcase/lm-pruned.mmap"). I have observed this in more than 
one case.

I don't know who blame - either the conversion from iARPA to ARPA or KenLM.

What is your best practice for estimating large LMs to be converted to KenLM? 
Is it SRILM?

--
Karlis

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Using KENLM

Reply via email to