Hi Moses, Introducing kenlm in Moses trunk. You no longer need to download a separate language model to use Moses; it's distributed with Moses and compiled in by default on UNIX. This is threadsafe language model inference code that returns the same probabilities as SRI (up to floating point rounding). It loads APRA files in 2/3 the time SRI takes and uses less memory too. Using kenlm is simple: in your [lmodel-file] section, change the first digit to 8. For example,
"0 0 2 foo.arpa" changes to "8 0 2 foo.arpa" For even faster loading, use the binary format: kenlm/build_binary foo.arpa foo.binary then simply provide the binary filename in your moses.ini e.g. "8 0 2 foo.binary"; it auto detects binary files using magic bytes at the beginning. The code is ready for use and provides correct results. Inference is slower than it should be due to inefficiencies in the Moses-side wrapper code (it does a vocab lookup for all 5 words every time). I'm working on it and once this is done I'll post some benchmarks against SRI and IRST. The binary format is subject to change, but contains a version number so on very rare occasions after, new versions will tell you to rebuild your binary files. Windows is currently not supported (it uses mmap) though I welcome contributions using #ifdef and CreateFileMapping. Have fun and let me know about your experiences with it. "Ken" _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support