I've fixed this in revision 3657 and tested that it works with a toy
IRSTLM example.

Sorry about that,

Kenneth

P.S. a faster version is under code review and coming soon.

On 10/26/10 03:57, Nicola Bertoldi wrote:
> the empty line after each ngram-block is not mandatory in the ARPA format
> (see http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html)
> and IRSTLM does not produce it.
> 
> 
> best regards,
> Nicola Bertoldi
> 
> On Oct 26, 2010, at 9:42 AM, <supp...@precisiontranslationtools.com>
> <supp...@precisiontranslationtools.com> wrote:
> 
>> Hi Ken,
>>
>> I'm created an iARPA file with IRSTLM using the options -n 3 (2
>> grams), -b
>> (include the <s> sentence boundary) and -d (subdictionary for ngrams).
>> Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA
>> format.
>>
>> Finally, I ran build_binary to binarize the ARPA format for KenLM. I got
>> the following error:
>>
>> $ build_binary arpa.en.lm arpa.en.binary
>> Reading lm.en.lm
>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>
>> terminate called after throwing an instance of 'lm::FormatLoadException'
>>   what():  Expected blank line after 3-grams at byte 22348989 in file
>> arpa.en.lm
>> Aborted
>>
>> What am I missing?
>>
>> Thanks,
>> Tom
>>
>>
>> On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield
>> <mo...@kheafield.com>
>> wrote:
>>> KenLM is inference-only.  It cannot create ARPA files.  So you'll need
>>> to use your favorite toolkit to generate the ARPA.
>>>
>>> On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
>>>> Thanks Ken. Nice work.
>>>>
>>>> Is there a way to train the ARPA formatted LM with KenLM, or do we need
>>>> to
>>>> train with another tool, like SRILM or convert IRSTLM to full ARPA
>>>> format?
>>>>
>>>> Thanks again,
>>>> Tom
>>>>
>>>>
>>>>
>>>> On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield
>>>> <mo...@kheafield.com>
>>>> wrote:
>>>>> Hi Moses,
>>>>>
>>>>>     Introducing kenlm in Moses trunk.  You no longer need to
>>>>> download a
>>>>> separate language model to use Moses; it's distributed with Moses and
>>>>> compiled in by default on UNIX.  This is threadsafe language model
>>>>> inference code that returns the same probabilities as SRI (up to
>>>>> floating point rounding).  It loads APRA files in 2/3 the time SRI
>> takes
>>>>> and uses less memory too.  Using kenlm is simple: in your
>> [lmodel-file]
>>>>> section, change the first digit to 8.  For example,
>>>>>
>>>>> "0 0 2 foo.arpa" changes to "8 0 2 foo.arpa"
>>>>>
>>>>>     For even faster loading, use the binary format:
>>>>>
>>>>> kenlm/build_binary foo.arpa foo.binary
>>>>>
>>>>> then simply provide the binary filename in your moses.ini e.g.
>>>>> "8 0 2 foo.binary"; it auto detects binary files using magic bytes at
>>>>> the beginning.
>>>>>
>>>>>     The code is ready for use and provides correct results. 
>>>>> Inference is
>>>>> slower than it should be due to inefficiencies in the Moses-side
>> wrapper
>>>>> code (it does a vocab lookup for all 5 words every time).  I'm working
>>>>> on it and once this is done I'll post some benchmarks against SRI and
>>>>> IRST. The binary format is subject to change, but contains a version
>>>>> number so on very rare occasions after, new versions will tell you to
>>>>> rebuild your binary files.  Windows is currently not supported (it
>> uses
>>>>> mmap) though I welcome contributions using #ifdef and
>> CreateFileMapping.
>>>>>
>>>>>     Have fun and let me know about your experiences with it.
>>>>>
>>>>> "Ken"
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to