Hi Moses,

        Introducing kenlm in Moses trunk.  You no longer need to download a
separate language model to use Moses; it's distributed with Moses and
compiled in by default on UNIX.  This is threadsafe language model
inference code that returns the same probabilities as SRI (up to
floating point rounding).  It loads APRA files in 2/3 the time SRI takes
and uses less memory too.  Using kenlm is simple: in your [lmodel-file]
section, change the first digit to 8.  For example,

"0 0 2 foo.arpa" changes to "8 0 2 foo.arpa"

        For even faster loading, use the binary format:

kenlm/build_binary foo.arpa foo.binary

then simply provide the binary filename in your moses.ini e.g.
"8 0 2 foo.binary"; it auto detects binary files using magic bytes at
the beginning.

        The code is ready for use and provides correct results.  Inference is
slower than it should be due to inefficiencies in the Moses-side wrapper
code (it does a vocab lookup for all 5 words every time).  I'm working
on it and once this is done I'll post some benchmarks against SRI and
IRST. The binary format is subject to change, but contains a version
number so on very rare occasions after, new versions will tell you to
rebuild your binary files.  Windows is currently not supported (it uses
mmap) though I welcome contributions using #ifdef and CreateFileMapping.

        Have fun and let me know about your experiences with it.

Moses-support mailing list

Reply via email to