I've fixed this in revision 3657 and tested that it works with a toy IRSTLM example.
Sorry about that, Kenneth P.S. a faster version is under code review and coming soon. On 10/26/10 03:57, Nicola Bertoldi wrote: > the empty line after each ngram-block is not mandatory in the ARPA format > (see http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html) > and IRSTLM does not produce it. > > > best regards, > Nicola Bertoldi > > On Oct 26, 2010, at 9:42 AM, <supp...@precisiontranslationtools.com> > <supp...@precisiontranslationtools.com> wrote: > >> Hi Ken, >> >> I'm created an iARPA file with IRSTLM using the options -n 3 (2 >> grams), -b >> (include the <s> sentence boundary) and -d (subdictionary for ngrams). >> Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA >> format. >> >> Finally, I ran build_binary to binarize the ARPA format for KenLM. I got >> the following error: >> >> $ build_binary arpa.en.lm arpa.en.binary >> Reading lm.en.lm >> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 >> >> terminate called after throwing an instance of 'lm::FormatLoadException' >> what(): Expected blank line after 3-grams at byte 22348989 in file >> arpa.en.lm >> Aborted >> >> What am I missing? >> >> Thanks, >> Tom >> >> >> On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield >> <mo...@kheafield.com> >> wrote: >>> KenLM is inference-only. It cannot create ARPA files. So you'll need >>> to use your favorite toolkit to generate the ARPA. >>> >>> On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote: >>>> Thanks Ken. Nice work. >>>> >>>> Is there a way to train the ARPA formatted LM with KenLM, or do we need >>>> to >>>> train with another tool, like SRILM or convert IRSTLM to full ARPA >>>> format? >>>> >>>> Thanks again, >>>> Tom >>>> >>>> >>>> >>>> On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield >>>> <mo...@kheafield.com> >>>> wrote: >>>>> Hi Moses, >>>>> >>>>> Introducing kenlm in Moses trunk. You no longer need to >>>>> download a >>>>> separate language model to use Moses; it's distributed with Moses and >>>>> compiled in by default on UNIX. This is threadsafe language model >>>>> inference code that returns the same probabilities as SRI (up to >>>>> floating point rounding). It loads APRA files in 2/3 the time SRI >> takes >>>>> and uses less memory too. Using kenlm is simple: in your >> [lmodel-file] >>>>> section, change the first digit to 8. For example, >>>>> >>>>> "0 0 2 foo.arpa" changes to "8 0 2 foo.arpa" >>>>> >>>>> For even faster loading, use the binary format: >>>>> >>>>> kenlm/build_binary foo.arpa foo.binary >>>>> >>>>> then simply provide the binary filename in your moses.ini e.g. >>>>> "8 0 2 foo.binary"; it auto detects binary files using magic bytes at >>>>> the beginning. >>>>> >>>>> The code is ready for use and provides correct results. >>>>> Inference is >>>>> slower than it should be due to inefficiencies in the Moses-side >> wrapper >>>>> code (it does a vocab lookup for all 5 words every time). I'm working >>>>> on it and once this is done I'll post some benchmarks against SRI and >>>>> IRST. The binary format is subject to change, but contains a version >>>>> number so on very rare occasions after, new versions will tell you to >>>>> rebuild your binary files. Windows is currently not supported (it >> uses >>>>> mmap) though I welcome contributions using #ifdef and >> CreateFileMapping. >>>>> >>>>> Have fun and let me know about your experiences with it. >>>>> >>>>> "Ken" >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> Moses-support@mit.edu >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >> _______________________________________________ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support