Hi Ken, 

I'm created an iARPA file with IRSTLM using the options -n 3 (2 grams), -b
(include the <s> sentence boundary) and -d (subdictionary for ngrams).
Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA format.

Finally, I ran build_binary to binarize the ARPA format for KenLM. I got
the following error:

$ build_binary arpa.en.lm arpa.en.binary
Reading lm.en.lm
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
terminate called after throwing an instance of 'lm::FormatLoadException'
  what():  Expected blank line after 3-grams at byte 22348989 in file
arpa.en.lm
Aborted

What am I missing?

Thanks,
Tom


On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield <mo...@kheafield.com>
wrote:
> KenLM is inference-only.  It cannot create ARPA files.  So you'll need
> to use your favorite toolkit to generate the ARPA.
> 
> On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
>> Thanks Ken. Nice work. 
>> 
>> Is there a way to train the ARPA formatted LM with KenLM, or do we need
>> to
>> train with another tool, like SRILM or convert IRSTLM to full ARPA
>> format?
>> 
>> Thanks again,
>> Tom
>> 
>> 
>> 
>> On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield
>> <mo...@kheafield.com>
>> wrote:
>>> Hi Moses,
>>>
>>>     Introducing kenlm in Moses trunk.  You no longer need to download a
>>> separate language model to use Moses; it's distributed with Moses and
>>> compiled in by default on UNIX.  This is threadsafe language model
>>> inference code that returns the same probabilities as SRI (up to
>>> floating point rounding).  It loads APRA files in 2/3 the time SRI
takes
>>> and uses less memory too.  Using kenlm is simple: in your
[lmodel-file]
>>> section, change the first digit to 8.  For example,
>>>
>>> "0 0 2 foo.arpa" changes to "8 0 2 foo.arpa"
>>>
>>>     For even faster loading, use the binary format:
>>>
>>> kenlm/build_binary foo.arpa foo.binary
>>>
>>> then simply provide the binary filename in your moses.ini e.g.
>>> "8 0 2 foo.binary"; it auto detects binary files using magic bytes at
>>> the beginning.
>>>
>>>     The code is ready for use and provides correct results.  Inference is
>>> slower than it should be due to inefficiencies in the Moses-side
wrapper
>>> code (it does a vocab lookup for all 5 words every time).  I'm working
>>> on it and once this is done I'll post some benchmarks against SRI and
>>> IRST. The binary format is subject to change, but contains a version
>>> number so on very rare occasions after, new versions will tell you to
>>> rebuild your binary files.  Windows is currently not supported (it
uses
>>> mmap) though I welcome contributions using #ifdef and
CreateFileMapping.
>>>
>>>     Have fun and let me know about your experiences with it.
>>>
>>> "Ken"
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to