Re: [Moses-support] KenLM distributed with Moses

Kenneth Heafield Fri, 29 Oct 2010 07:18:35 -0700

Thanks for sharing!  Looks like building my Moses system from scratch
finally finished, so I'll be making some memory benchmarks today too.


Just so I understand, you ran separate MERT for each of your three
cases?  Then MERT randomness should explain the insignificant difference
in BLEU between result 1 and result 3.

Kenneth

On 10/29/10 10:06, supp...@precisiontranslationtools.com wrote:
> Ken,
> 
> Your new enhancements ROCK! Here are some numbers using rev 3675 and
> IRSTLM 5.50.01
> 
> Machine: Core2Quad, 2.4 Ghz, 4 GB RAM
> Data: EN-NL sample data, 37,500 segments (micro test sample)
>       3 gram LM, 3 gram tables (for fast testing)
> 
> Train LM with SRILM & 
> Train tables/tune/eval with
> Moses/SRILM
> multi-threading enabled:          75 minutes
> BLEU Score:                       0.2531
> 
> Train LM with IRSTLM
> Train tables/tune/eval with
> Moses/IRSLM, binarized memap,
> single thread:                    195 minutes
> BLEU Score:                       0.2496
> 
> Train LM with IRSTLM (ARPA)
> Train tables/tune/eval with
> Moses/KenLM, binarized memap,
> multi-threaded:                   50 minutes
> BLEU Score:                       0.2514
> 
> 
> 
> 
> On Wed, 27 Oct 2010 14:15:39 -0400, Kenneth Heafield <mo...@kheafield.com>
> wrote:
>> Revision 3671 introduces an updated version of kenlm.  Queries are
>> faster now (no more string vocab lookups, state is kept so backoffs cost
>> less).  The binary format has changed as a result; please rebuild your
>> binary files.  Timing is forthcoming.
>>
>> Kenneth
>>
>> On 10/18/10 20:31, Kenneth Heafield wrote:
>>> Hi Moses,
>>>
>>>     Introducing kenlm in Moses trunk.  You no longer need to download a
>>> separate language model to use Moses; it's distributed with Moses and
>>> compiled in by default on UNIX.  This is threadsafe language model
>>> inference code that returns the same probabilities as SRI (up to
>>> floating point rounding).  It loads APRA files in 2/3 the time SRI
> takes
>>> and uses less memory too.  Using kenlm is simple: in your [lmodel-file]
>>> section, change the first digit to 8.  For example,
>>>
>>> "0 0 2 foo.arpa" changes to "8 0 2 foo.arpa"
>>>
>>>     For even faster loading, use the binary format:
>>>
>>> kenlm/build_binary foo.arpa foo.binary
>>>
>>> then simply provide the binary filename in your moses.ini e.g.
>>> "8 0 2 foo.binary"; it auto detects binary files using magic bytes at
>>> the beginning.
>>>
>>>     The code is ready for use and provides correct results.  Inference is
>>> slower than it should be due to inefficiencies in the Moses-side
> wrapper
>>> code (it does a vocab lookup for all 5 words every time).  I'm working
>>> on it and once this is done I'll post some benchmarks against SRI and
>>> IRST. The binary format is subject to change, but contains a version
>>> number so on very rare occasions after, new versions will tell you to
>>> rebuild your binary files.  Windows is currently not supported (it uses
>>> mmap) though I welcome contributions using #ifdef and
> CreateFileMapping.
>>>
>>>     Have fun and let me know about your experiences with it.
>>>
>>> "Ken"
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] KenLM distributed with Moses

Reply via email to