the empty line after each ngram-block is not mandatory in the ARPA format (see http://www.speech.sri.com/projects/srilm/manpages/ngram-format. 5.html) and IRSTLM does not produce it.
best regards, Nicola Bertoldi On Oct 26, 2010, at 9:42 AM, <supp...@precisiontranslationtools.com> <supp...@precisiontranslationtools.com> wrote: > Hi Ken, > > I'm created an iARPA file with IRSTLM using the options -n 3 (2 > grams), -b > (include the <s> sentence boundary) and -d (subdictionary for ngrams). > Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA > format. > > Finally, I ran build_binary to binarize the ARPA format for KenLM. > I got > the following error: > > $ build_binary arpa.en.lm arpa.en.binary > Reading lm.en.lm > ----5---10---15---20---25---30---35---40---45---50---55---60---65---70 > ---75---80---85---90---95--100 > terminate called after throwing an instance of > 'lm::FormatLoadException' > what(): Expected blank line after 3-grams at byte 22348989 in file > arpa.en.lm > Aborted > > What am I missing? > > Thanks, > Tom > > > On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield > <mo...@kheafield.com> > wrote: >> KenLM is inference-only. It cannot create ARPA files. So you'll >> need >> to use your favorite toolkit to generate the ARPA. >> >> On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote: >>> Thanks Ken. Nice work. >>> >>> Is there a way to train the ARPA formatted LM with KenLM, or do >>> we need >>> to >>> train with another tool, like SRILM or convert IRSTLM to full ARPA >>> format? >>> >>> Thanks again, >>> Tom >>> >>> >>> >>> On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield >>> <mo...@kheafield.com> >>> wrote: >>>> Hi Moses, >>>> >>>> Introducing kenlm in Moses trunk. You no longer need to >>>> download a >>>> separate language model to use Moses; it's distributed with >>>> Moses and >>>> compiled in by default on UNIX. This is threadsafe language model >>>> inference code that returns the same probabilities as SRI (up to >>>> floating point rounding). It loads APRA files in 2/3 the time SRI > takes >>>> and uses less memory too. Using kenlm is simple: in your > [lmodel-file] >>>> section, change the first digit to 8. For example, >>>> >>>> "0 0 2 foo.arpa" changes to "8 0 2 foo.arpa" >>>> >>>> For even faster loading, use the binary format: >>>> >>>> kenlm/build_binary foo.arpa foo.binary >>>> >>>> then simply provide the binary filename in your moses.ini e.g. >>>> "8 0 2 foo.binary"; it auto detects binary files using magic >>>> bytes at >>>> the beginning. >>>> >>>> The code is ready for use and provides correct results. >>>> Inference is >>>> slower than it should be due to inefficiencies in the Moses-side > wrapper >>>> code (it does a vocab lookup for all 5 words every time). I'm >>>> working >>>> on it and once this is done I'll post some benchmarks against >>>> SRI and >>>> IRST. The binary format is subject to change, but contains a >>>> version >>>> number so on very rare occasions after, new versions will tell >>>> you to >>>> rebuild your binary files. Windows is currently not supported (it > uses >>>> mmap) though I welcome contributions using #ifdef and > CreateFileMapping. >>>> >>>> Have fun and let me know about your experiences with it. >>>> >>>> "Ken" >>>> _______________________________________________ >>>> Moses-support mailing list >>>> Moses-support@mit.edu >>>> http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support