Hi,

My preferred way to build large LMs has been IRSTLM as it can handle large 
corpora nicely by splitting the task. The produced binary LMs work well with 
Moses. Then I decided to try the new and shiny KenLM. However, when converting 
the result to KenLM format, the converted LM gives a lot worse BLEU score. 

I am building the LMs in this way:

1) build-lm.sh - build iARPA LM
2) prune-lm
3) compile-lm - convert iARPA to binary IRSTLM (gives me 0.3346 BLEU)
4) compile-lm --text=yes - convert iARPA to ARPA
5) build_binary trie - convert ARPA to KenLM (gives me 0.2543 BLEU)

The moses.ini for both cases is the same, differing only in the LM line (1 0 5 
"/mnt/smt/lm/mt4-lv-lcase/lm-pruned.blm" vs "8 0 5 
/mnt/smt/lm/mt4-lv-lcase/lm-pruned.mmap"). I have observed this in more than 
one case.

I don't know who blame - either the conversion from iARPA to ARPA or KenLM.

What is your best practice for estimating large LMs to be converted to KenLM? 
Is it SRILM?

--
Karlis

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to