the empty line after each ngram-block is not mandatory in the ARPA  
format
(see http://www.speech.sri.com/projects/srilm/manpages/ngram-format. 
5.html)
and IRSTLM does not produce it.


best regards,
Nicola Bertoldi

On Oct 26, 2010, at 9:42 AM, <supp...@precisiontranslationtools.com>  
<supp...@precisiontranslationtools.com> wrote:

> Hi Ken,
>
> I'm created an iARPA file with IRSTLM using the options -n 3 (2  
> grams), -b
> (include the <s> sentence boundary) and -d (subdictionary for ngrams).
> Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA  
> format.
>
> Finally, I ran build_binary to binarize the ARPA format for KenLM.  
> I got
> the following error:
>
> $ build_binary arpa.en.lm arpa.en.binary
> Reading lm.en.lm
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70 
> ---75---80---85---90---95--100
> terminate called after throwing an instance of  
> 'lm::FormatLoadException'
>   what():  Expected blank line after 3-grams at byte 22348989 in file
> arpa.en.lm
> Aborted
>
> What am I missing?
>
> Thanks,
> Tom
>
>
> On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield  
> <mo...@kheafield.com>
> wrote:
>> KenLM is inference-only.  It cannot create ARPA files.  So you'll  
>> need
>> to use your favorite toolkit to generate the ARPA.
>>
>> On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
>>> Thanks Ken. Nice work.
>>>
>>> Is there a way to train the ARPA formatted LM with KenLM, or do  
>>> we need
>>> to
>>> train with another tool, like SRILM or convert IRSTLM to full ARPA
>>> format?
>>>
>>> Thanks again,
>>> Tom
>>>
>>>
>>>
>>> On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield
>>> <mo...@kheafield.com>
>>> wrote:
>>>> Hi Moses,
>>>>
>>>>    Introducing kenlm in Moses trunk.  You no longer need to  
>>>> download a
>>>> separate language model to use Moses; it's distributed with  
>>>> Moses and
>>>> compiled in by default on UNIX.  This is threadsafe language model
>>>> inference code that returns the same probabilities as SRI (up to
>>>> floating point rounding).  It loads APRA files in 2/3 the time SRI
> takes
>>>> and uses less memory too.  Using kenlm is simple: in your
> [lmodel-file]
>>>> section, change the first digit to 8.  For example,
>>>>
>>>> "0 0 2 foo.arpa" changes to "8 0 2 foo.arpa"
>>>>
>>>>    For even faster loading, use the binary format:
>>>>
>>>> kenlm/build_binary foo.arpa foo.binary
>>>>
>>>> then simply provide the binary filename in your moses.ini e.g.
>>>> "8 0 2 foo.binary"; it auto detects binary files using magic  
>>>> bytes at
>>>> the beginning.
>>>>
>>>>    The code is ready for use and provides correct results.   
>>>> Inference is
>>>> slower than it should be due to inefficiencies in the Moses-side
> wrapper
>>>> code (it does a vocab lookup for all 5 words every time).  I'm  
>>>> working
>>>> on it and once this is done I'll post some benchmarks against  
>>>> SRI and
>>>> IRST. The binary format is subject to change, but contains a  
>>>> version
>>>> number so on very rare occasions after, new versions will tell  
>>>> you to
>>>> rebuild your binary files.  Windows is currently not supported (it
> uses
>>>> mmap) though I welcome contributions using #ifdef and
> CreateFileMapping.
>>>>
>>>>    Have fun and let me know about your experiences with it.
>>>>
>>>> "Ken"
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to