Hi Ken, I'm created an iARPA file with IRSTLM using the options -n 3 (2 grams), -b (include the <s> sentence boundary) and -d (subdictionary for ngrams). Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA format.
Finally, I ran build_binary to binarize the ARPA format for KenLM. I got the following error: $ build_binary arpa.en.lm arpa.en.binary Reading lm.en.lm ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 terminate called after throwing an instance of 'lm::FormatLoadException' what(): Expected blank line after 3-grams at byte 22348989 in file arpa.en.lm Aborted What am I missing? Thanks, Tom On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield <mo...@kheafield.com> wrote: > KenLM is inference-only. It cannot create ARPA files. So you'll need > to use your favorite toolkit to generate the ARPA. > > On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote: >> Thanks Ken. Nice work. >> >> Is there a way to train the ARPA formatted LM with KenLM, or do we need >> to >> train with another tool, like SRILM or convert IRSTLM to full ARPA >> format? >> >> Thanks again, >> Tom >> >> >> >> On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield >> <mo...@kheafield.com> >> wrote: >>> Hi Moses, >>> >>> Introducing kenlm in Moses trunk. You no longer need to download a >>> separate language model to use Moses; it's distributed with Moses and >>> compiled in by default on UNIX. This is threadsafe language model >>> inference code that returns the same probabilities as SRI (up to >>> floating point rounding). It loads APRA files in 2/3 the time SRI takes >>> and uses less memory too. Using kenlm is simple: in your [lmodel-file] >>> section, change the first digit to 8. For example, >>> >>> "0 0 2 foo.arpa" changes to "8 0 2 foo.arpa" >>> >>> For even faster loading, use the binary format: >>> >>> kenlm/build_binary foo.arpa foo.binary >>> >>> then simply provide the binary filename in your moses.ini e.g. >>> "8 0 2 foo.binary"; it auto detects binary files using magic bytes at >>> the beginning. >>> >>> The code is ready for use and provides correct results. Inference is >>> slower than it should be due to inefficiencies in the Moses-side wrapper >>> code (it does a vocab lookup for all 5 words every time). I'm working >>> on it and once this is done I'll post some benchmarks against SRI and >>> IRST. The binary format is subject to change, but contains a version >>> number so on very rare occasions after, new versions will tell you to >>> rebuild your binary files. Windows is currently not supported (it uses >>> mmap) though I welcome contributions using #ifdef and CreateFileMapping. >>> >>> Have fun and let me know about your experiences with it. >>> >>> "Ken" >>> _______________________________________________ >>> Moses-support mailing list >>> Moses-support@mit.edu >>> http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support