Ken, Your new enhancements ROCK! Here are some numbers using rev 3675 and IRSTLM 5.50.01
Machine: Core2Quad, 2.4 Ghz, 4 GB RAM Data: EN-NL sample data, 37,500 segments (micro test sample) 3 gram LM, 3 gram tables (for fast testing) Train LM with SRILM & Train tables/tune/eval with Moses/SRILM multi-threading enabled: 75 minutes BLEU Score: 0.2531 Train LM with IRSTLM Train tables/tune/eval with Moses/IRSLM, binarized memap, single thread: 195 minutes BLEU Score: 0.2496 Train LM with IRSTLM (ARPA) Train tables/tune/eval with Moses/KenLM, binarized memap, multi-threaded: 50 minutes BLEU Score: 0.2514 On Wed, 27 Oct 2010 14:15:39 -0400, Kenneth Heafield <mo...@kheafield.com> wrote: > Revision 3671 introduces an updated version of kenlm. Queries are > faster now (no more string vocab lookups, state is kept so backoffs cost > less). The binary format has changed as a result; please rebuild your > binary files. Timing is forthcoming. > > Kenneth > > On 10/18/10 20:31, Kenneth Heafield wrote: >> Hi Moses, >> >> Introducing kenlm in Moses trunk. You no longer need to download a >> separate language model to use Moses; it's distributed with Moses and >> compiled in by default on UNIX. This is threadsafe language model >> inference code that returns the same probabilities as SRI (up to >> floating point rounding). It loads APRA files in 2/3 the time SRI takes >> and uses less memory too. Using kenlm is simple: in your [lmodel-file] >> section, change the first digit to 8. For example, >> >> "0 0 2 foo.arpa" changes to "8 0 2 foo.arpa" >> >> For even faster loading, use the binary format: >> >> kenlm/build_binary foo.arpa foo.binary >> >> then simply provide the binary filename in your moses.ini e.g. >> "8 0 2 foo.binary"; it auto detects binary files using magic bytes at >> the beginning. >> >> The code is ready for use and provides correct results. Inference is >> slower than it should be due to inefficiencies in the Moses-side wrapper >> code (it does a vocab lookup for all 5 words every time). I'm working >> on it and once this is done I'll post some benchmarks against SRI and >> IRST. The binary format is subject to change, but contains a version >> number so on very rare occasions after, new versions will tell you to >> rebuild your binary files. Windows is currently not supported (it uses >> mmap) though I welcome contributions using #ifdef and CreateFileMapping. >> >> Have fun and let me know about your experiences with it. >> >> "Ken" >> _______________________________________________ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support