Ken,

Your new enhancements ROCK! Here are some numbers using rev 3675 and
IRSTLM 5.50.01

Machine: Core2Quad, 2.4 Ghz, 4 GB RAM
Data: EN-NL sample data, 37,500 segments (micro test sample)
      3 gram LM, 3 gram tables (for fast testing)

Train LM with SRILM & 
Train tables/tune/eval with
Moses/SRILM
multi-threading enabled:          75 minutes
BLEU Score:                       0.2531

Train LM with IRSTLM
Train tables/tune/eval with
Moses/IRSLM, binarized memap,
single thread:                    195 minutes
BLEU Score:                       0.2496

Train LM with IRSTLM (ARPA)
Train tables/tune/eval with
Moses/KenLM, binarized memap,
multi-threaded:                   50 minutes
BLEU Score:                       0.2514




On Wed, 27 Oct 2010 14:15:39 -0400, Kenneth Heafield <mo...@kheafield.com>
wrote:
> Revision 3671 introduces an updated version of kenlm.  Queries are
> faster now (no more string vocab lookups, state is kept so backoffs cost
> less).  The binary format has changed as a result; please rebuild your
> binary files.  Timing is forthcoming.
> 
> Kenneth
> 
> On 10/18/10 20:31, Kenneth Heafield wrote:
>> Hi Moses,
>> 
>>      Introducing kenlm in Moses trunk.  You no longer need to download a
>> separate language model to use Moses; it's distributed with Moses and
>> compiled in by default on UNIX.  This is threadsafe language model
>> inference code that returns the same probabilities as SRI (up to
>> floating point rounding).  It loads APRA files in 2/3 the time SRI
takes
>> and uses less memory too.  Using kenlm is simple: in your [lmodel-file]
>> section, change the first digit to 8.  For example,
>> 
>> "0 0 2 foo.arpa" changes to "8 0 2 foo.arpa"
>> 
>>      For even faster loading, use the binary format:
>> 
>> kenlm/build_binary foo.arpa foo.binary
>> 
>> then simply provide the binary filename in your moses.ini e.g.
>> "8 0 2 foo.binary"; it auto detects binary files using magic bytes at
>> the beginning.
>> 
>>      The code is ready for use and provides correct results.  Inference is
>> slower than it should be due to inefficiencies in the Moses-side
wrapper
>> code (it does a vocab lookup for all 5 words every time).  I'm working
>> on it and once this is done I'll post some benchmarks against SRI and
>> IRST. The binary format is subject to change, but contains a version
>> number so on very rare occasions after, new versions will tell you to
>> rebuild your binary files.  Windows is currently not supported (it uses
>> mmap) though I welcome contributions using #ifdef and
CreateFileMapping.
>> 
>>      Have fun and let me know about your experiences with it.
>> 
>> "Ken"
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to