[Moses-support] KenLM now does estimation

Kenneth Heafield Fri, 18 Jan 2013 08:33:58 -0800

Dear Moses,

        KenLM now estimates modified Kneser-Ney language models from text. 
This is done with streaming on-disk algorithms where you pick the memory 
buffer size, enabling you to build much larger language models (i.e. all 
the data allowed by WMT 2013) without running out of RAM.


        It is in Moses master as of fc5868d and as a standalone from 
http://kheafield.com/code/kenlm.tar.gz.  The command line is relatively 
simple:

bin/lmplz -o 5 <text >text.arpa

Memory usage (-S 80%) and temporary file location (-T /tmp) options are 
compatible with GNU sort.

        There is NO PRUNING, so the comparable SRILM command line is

ngram-count -order 5 -interpolate -kndiscount -unk -gt3min 1 -gt4min 1 
-gt5min 1 -text text -lm text.arpa

For more documentation, see http://kheafield.com/code/kenlm/estimation/ .

Kenneth
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] KenLM now does estimation

Reply via email to