Re: [Moses-support] KenLM distributed with Moses

2010-11-02 Thread Lee Ball (Applied Language)
Hi, I've not seen it in this list but what licenses is KenLM distributed
under?

Kind regards,

Lee Ball
Infrastructure Manager
lee.b...@appliedlanguage.com
Skype ID: lee.ball_appliedlanguage
Tel: +44 (0)844 854 8945

Applied Language Solutions
High quality language solutions delivered on time...with a smile!

www.appliedlanguage.com
Tel (UK): +44 (0)845 367 7000
Tel (US): +1 (800) 579-5010

Riverside Court, Huddersfield Road, Delph, Oldham, OL3 5FZ. UK
Registered in the UK 5122429

Pride in everything we do | Respect everyone like a friend
[image: An Environmentally Friendly Company]Think of the environment; please
don't print this e-mail unless you really need to.

[image: Fast Track 100 2009][image: Queens Award for Business]



On 19 October 2010 01:31, Kenneth Heafield mo...@kheafield.com wrote:

 Hi Moses,

Introducing kenlm in Moses trunk.  You no longer need to download a
 separate language model to use Moses; it's distributed with Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI takes
 and uses less memory too.  Using kenlm is simple: in your [lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes at
 the beginning.

The code is ready for use and provides correct results.  Inference
 is
 slower than it should be due to inefficiencies in the Moses-side wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm working
 on it and once this is done I'll post some benchmarks against SRI and
 IRST. The binary format is subject to change, but contains a version
 number so on very rare occasions after, new versions will tell you to
 rebuild your binary files.  Windows is currently not supported (it uses
 mmap) though I welcome contributions using #ifdef and CreateFileMapping.

Have fun and let me know about your experiences with it.

 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-29 Thread support
Ken,

Your new enhancements ROCK! Here are some numbers using rev 3675 and
IRSTLM 5.50.01

Machine: Core2Quad, 2.4 Ghz, 4 GB RAM
Data: EN-NL sample data, 37,500 segments (micro test sample)
  3 gram LM, 3 gram tables (for fast testing)

Train LM with SRILM  
Train tables/tune/eval with
Moses/SRILM
multi-threading enabled:  75 minutes
BLEU Score:   0.2531

Train LM with IRSTLM
Train tables/tune/eval with
Moses/IRSLM, binarized memap,
single thread:195 minutes
BLEU Score:   0.2496

Train LM with IRSTLM (ARPA)
Train tables/tune/eval with
Moses/KenLM, binarized memap,
multi-threaded:   50 minutes
BLEU Score:   0.2514




On Wed, 27 Oct 2010 14:15:39 -0400, Kenneth Heafield mo...@kheafield.com
wrote:
 Revision 3671 introduces an updated version of kenlm.  Queries are
 faster now (no more string vocab lookups, state is kept so backoffs cost
 less).  The binary format has changed as a result; please rebuild your
 binary files.  Timing is forthcoming.
 
 Kenneth
 
 On 10/18/10 20:31, Kenneth Heafield wrote:
 Hi Moses,
 
  Introducing kenlm in Moses trunk.  You no longer need to download a
 separate language model to use Moses; it's distributed with Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI
takes
 and uses less memory too.  Using kenlm is simple: in your [lmodel-file]
 section, change the first digit to 8.  For example,
 
 0 0 2 foo.arpa changes to 8 0 2 foo.arpa
 
  For even faster loading, use the binary format:
 
 kenlm/build_binary foo.arpa foo.binary
 
 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes at
 the beginning.
 
  The code is ready for use and provides correct results.  Inference is
 slower than it should be due to inefficiencies in the Moses-side
wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm working
 on it and once this is done I'll post some benchmarks against SRI and
 IRST. The binary format is subject to change, but contains a version
 number so on very rare occasions after, new versions will tell you to
 rebuild your binary files.  Windows is currently not supported (it uses
 mmap) though I welcome contributions using #ifdef and
CreateFileMapping.
 
  Have fun and let me know about your experiences with it.
 
 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-29 Thread Kenneth Heafield
Thanks for sharing!  Looks like building my Moses system from scratch
finally finished, so I'll be making some memory benchmarks today too.

Just so I understand, you ran separate MERT for each of your three
cases?  Then MERT randomness should explain the insignificant difference
in BLEU between result 1 and result 3.

Kenneth

On 10/29/10 10:06, supp...@precisiontranslationtools.com wrote:
 Ken,
 
 Your new enhancements ROCK! Here are some numbers using rev 3675 and
 IRSTLM 5.50.01
 
 Machine: Core2Quad, 2.4 Ghz, 4 GB RAM
 Data: EN-NL sample data, 37,500 segments (micro test sample)
   3 gram LM, 3 gram tables (for fast testing)
 
 Train LM with SRILM  
 Train tables/tune/eval with
 Moses/SRILM
 multi-threading enabled:  75 minutes
 BLEU Score:   0.2531
 
 Train LM with IRSTLM
 Train tables/tune/eval with
 Moses/IRSLM, binarized memap,
 single thread:195 minutes
 BLEU Score:   0.2496
 
 Train LM with IRSTLM (ARPA)
 Train tables/tune/eval with
 Moses/KenLM, binarized memap,
 multi-threaded:   50 minutes
 BLEU Score:   0.2514
 
 
 
 
 On Wed, 27 Oct 2010 14:15:39 -0400, Kenneth Heafield mo...@kheafield.com
 wrote:
 Revision 3671 introduces an updated version of kenlm.  Queries are
 faster now (no more string vocab lookups, state is kept so backoffs cost
 less).  The binary format has changed as a result; please rebuild your
 binary files.  Timing is forthcoming.

 Kenneth

 On 10/18/10 20:31, Kenneth Heafield wrote:
 Hi Moses,

 Introducing kenlm in Moses trunk.  You no longer need to download a
 separate language model to use Moses; it's distributed with Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI
 takes
 and uses less memory too.  Using kenlm is simple: in your [lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

 For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes at
 the beginning.

 The code is ready for use and provides correct results.  Inference is
 slower than it should be due to inefficiencies in the Moses-side
 wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm working
 on it and once this is done I'll post some benchmarks against SRI and
 IRST. The binary format is subject to change, but contains a version
 number so on very rare occasions after, new versions will tell you to
 rebuild your binary files.  Windows is currently not supported (it uses
 mmap) though I welcome contributions using #ifdef and
 CreateFileMapping.

 Have fun and let me know about your experiences with it.

 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-29 Thread support
Yes, all scores and times were from scratch without reusing anything. 

Precision Translation Tools will announce a simpler solution to building a
moses system from scratch next week. Essentially, from minimal server
configuration to completely installed Moses system in four steps and 30
minute wait time.

Stay tuned.

Tom

On Fri, 29 Oct 2010 10:17:54 -0400, Kenneth Heafield mo...@kheafield.com
wrote:
 Thanks for sharing!  Looks like building my Moses system from scratch
 finally finished, so I'll be making some memory benchmarks today too.
 
 Just so I understand, you ran separate MERT for each of your three
 cases?  Then MERT randomness should explain the insignificant difference
 in BLEU between result 1 and result 3.
 
 Kenneth
 
 On 10/29/10 10:06, supp...@precisiontranslationtools.com wrote:
 Ken,
 
 Your new enhancements ROCK! Here are some numbers using rev 3675 and
 IRSTLM 5.50.01
 
 Machine: Core2Quad, 2.4 Ghz, 4 GB RAM
 Data: EN-NL sample data, 37,500 segments (micro test sample)
   3 gram LM, 3 gram tables (for fast testing)
 
 Train LM with SRILM  
 Train tables/tune/eval with
 Moses/SRILM
 multi-threading enabled:  75 minutes
 BLEU Score:   0.2531
 
 Train LM with IRSTLM
 Train tables/tune/eval with
 Moses/IRSLM, binarized memap,
 single thread:195 minutes
 BLEU Score:   0.2496
 
 Train LM with IRSTLM (ARPA)
 Train tables/tune/eval with
 Moses/KenLM, binarized memap,
 multi-threaded:   50 minutes
 BLEU Score:   0.2514
 
 
 
 
 On Wed, 27 Oct 2010 14:15:39 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 Revision 3671 introduces an updated version of kenlm.  Queries are
 faster now (no more string vocab lookups, state is kept so backoffs
cost
 less).  The binary format has changed as a result; please rebuild your
 binary files.  Timing is forthcoming.

 Kenneth

 On 10/18/10 20:31, Kenneth Heafield wrote:
 Hi Moses,

Introducing kenlm in Moses trunk.  You no longer need to download a
 separate language model to use Moses; it's distributed with Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI
 takes
 and uses less memory too.  Using kenlm is simple: in your
[lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes at
 the beginning.

The code is ready for use and provides correct results.  Inference
is
 slower than it should be due to inefficiencies in the Moses-side
 wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm
working
 on it and once this is done I'll post some benchmarks against SRI and
 IRST. The binary format is subject to change, but contains a version
 number so on very rare occasions after, new versions will tell you to
 rebuild your binary files.  Windows is currently not supported (it
uses
 mmap) though I welcome contributions using #ifdef and
 CreateFileMapping.

Have fun and let me know about your experiences with it.

 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-27 Thread support
Thanks Ken for all your feedback, 

One more question. I'm using moses with boost. I uncommented the line
#define USE_BOOST in kenlm/util/string_piece.hh and recompiled Moses
without problems. 

Then, I uncommented #define USE_ICU and ./configure fails with the error
log below. libicu-dev and libicu42 are is loaded on my system. Also, each
compile started with a clean moses download.

Is USE_ICU usable or necessary with Moses? 

Thanks,
Tom


configure: Using Boost library
checking for boostlib = 1.36.0... yes
configure: Building threaded moses
checking whether the Boost::Thread library is available... yes
checking for exit in -lboost_thread-mt... yes
checking Ngram.h usability... yes
checking Ngram.h presence... yes
checking for Ngram.h... yes
checking for trigram_init in -loolm... yes
checking n_gram.h usability... yes
checking n_gram.h presence... yes
checking for n_gram.h... yes
checking lm/ngram.hh usability... no
checking lm/ngram.hh presence... yes
checking for lm/ngram.hh... no
configure: WARNING: lm/ngram.hh: present but cannot be compiled
configure: WARNING: lm/ngram.hh: check for missing prerequisite
headers?
configure: WARNING: lm/ngram.hh: see the Autoconf documentation
configure: WARNING: lm/ngram.hh: section Present But Cannot Be
Compiled
configure: WARNING: lm/ngram.hh: proceeding with the compiler's result
configure: error: Cannot find KEN-LM in yes




On Tue, 26 Oct 2010 12:48:13 -0400, Kenneth Heafield mo...@kheafield.com
wrote:
 Yes, I require s and /s to appear in your ARPA.  These tags are
 important from an output quality perspective (BLEU etc).  I'll put that
 in the documentation when I get around to writing it, but personally
 think IRST should include them by default.
 
 Kenneth
 
 On 10/26/10 12:30, supp...@precisiontranslationtools.com wrote:
 Thanks Ken. I tested it and it works. 
 
 FYI, on my first attempt there was a different error. Something about
the
 s token (word?) was missing. I added the s/s tags and re-ran
 irstlm's
 build-lm.sh script with option -b (Include sentence boundary n-grams)
and
 the error disappeared.
 
 It's pretty fast now. I look forward to testing the optimized code.
 
 Tom
 
 
 
 On Tue, 26 Oct 2010 10:18:17 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 I've fixed this in revision 3657 and tested that it works with a toy
 IRSTLM example.

 Sorry about that,

 Kenneth

 P.S. a faster version is under code review and coming soon.

 On 10/26/10 03:57, Nicola Bertoldi wrote:
 the empty line after each ngram-block is not mandatory in the ARPA
 format
 (see

http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html)
 and IRSTLM does not produce it.


 best regards,
 Nicola Bertoldi

 On Oct 26, 2010, at 9:42 AM, supp...@precisiontranslationtools.com
 supp...@precisiontranslationtools.com wrote:

 Hi Ken,

 I'm created an iARPA file with IRSTLM using the options -n 3 (2
 grams), -b
 (include the s sentence boundary) and -d (subdictionary for
ngrams).
 Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA
 format.

 Finally, I ran build_binary to binarize the ARPA format for KenLM. I
 got
 the following error:

 $ build_binary arpa.en.lm arpa.en.binary
 Reading lm.en.lm


5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

 terminate called after throwing an instance of
 'lm::FormatLoadException'
   what():  Expected blank line after 3-grams at byte 22348989 in
file
 arpa.en.lm
 Aborted

 What am I missing?

 Thanks,
 Tom


 On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 KenLM is inference-only.  It cannot create ARPA files.  So you'll
 need
 to use your favorite toolkit to generate the ARPA.

 On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
 Thanks Ken. Nice work.

 Is there a way to train the ARPA formatted LM with KenLM, or do we
 need
 to
 train with another tool, like SRILM or convert IRSTLM to full ARPA
 format?

 Thanks again,
 Tom



 On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 Hi Moses,

 Introducing kenlm in Moses trunk.  You no longer need to
 download a
 separate language model to use Moses; it's distributed with Moses
 and
 compiled in by default on UNIX.  This is threadsafe language
model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time
SRI
 takes
 and uses less memory too.  Using kenlm is simple: in your
 [lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

 For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic
bytes
 at
 the beginning.

 The code is ready for use and provides correct results. 
 Inference is
 slower than it should be due to inefficiencies 

Re: [Moses-support] KenLM distributed with Moses

2010-10-27 Thread Kenneth Heafield
Revision 3671 introduces an updated version of kenlm.  Queries are
faster now (no more string vocab lookups, state is kept so backoffs cost
less).  The binary format has changed as a result; please rebuild your
binary files.  Timing is forthcoming.

Kenneth

On 10/18/10 20:31, Kenneth Heafield wrote:
 Hi Moses,
 
   Introducing kenlm in Moses trunk.  You no longer need to download a
 separate language model to use Moses; it's distributed with Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI takes
 and uses less memory too.  Using kenlm is simple: in your [lmodel-file]
 section, change the first digit to 8.  For example,
 
 0 0 2 foo.arpa changes to 8 0 2 foo.arpa
 
   For even faster loading, use the binary format:
 
 kenlm/build_binary foo.arpa foo.binary
 
 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes at
 the beginning.
 
   The code is ready for use and provides correct results.  Inference is
 slower than it should be due to inefficiencies in the Moses-side wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm working
 on it and once this is done I'll post some benchmarks against SRI and
 IRST. The binary format is subject to change, but contains a version
 number so on very rare occasions after, new versions will tell you to
 rebuild your binary files.  Windows is currently not supported (it uses
 mmap) though I welcome contributions using #ifdef and CreateFileMapping.
 
   Have fun and let me know about your experiences with it.
 
 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread support
Hi Ken, 

I'm created an iARPA file with IRSTLM using the options -n 3 (2 grams), -b
(include the s sentence boundary) and -d (subdictionary for ngrams).
Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA format.

Finally, I ran build_binary to binarize the ARPA format for KenLM. I got
the following error:

$ build_binary arpa.en.lm arpa.en.binary
Reading lm.en.lm
5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
terminate called after throwing an instance of 'lm::FormatLoadException'
  what():  Expected blank line after 3-grams at byte 22348989 in file
arpa.en.lm
Aborted

What am I missing?

Thanks,
Tom


On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield mo...@kheafield.com
wrote:
 KenLM is inference-only.  It cannot create ARPA files.  So you'll need
 to use your favorite toolkit to generate the ARPA.
 
 On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
 Thanks Ken. Nice work. 
 
 Is there a way to train the ARPA formatted LM with KenLM, or do we need
 to
 train with another tool, like SRILM or convert IRSTLM to full ARPA
 format?
 
 Thanks again,
 Tom
 
 
 
 On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 Hi Moses,

 Introducing kenlm in Moses trunk.  You no longer need to download a
 separate language model to use Moses; it's distributed with Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI
takes
 and uses less memory too.  Using kenlm is simple: in your
[lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

 For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes at
 the beginning.

 The code is ready for use and provides correct results.  Inference is
 slower than it should be due to inefficiencies in the Moses-side
wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm working
 on it and once this is done I'll post some benchmarks against SRI and
 IRST. The binary format is subject to change, but contains a version
 number so on very rare occasions after, new versions will tell you to
 rebuild your binary files.  Windows is currently not supported (it
uses
 mmap) though I welcome contributions using #ifdef and
CreateFileMapping.

 Have fun and let me know about your experiences with it.

 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread Nicola Bertoldi
the empty line after each ngram-block is not mandatory in the ARPA  
format
(see http://www.speech.sri.com/projects/srilm/manpages/ngram-format. 
5.html)
and IRSTLM does not produce it.


best regards,
Nicola Bertoldi

On Oct 26, 2010, at 9:42 AM, supp...@precisiontranslationtools.com  
supp...@precisiontranslationtools.com wrote:

 Hi Ken,

 I'm created an iARPA file with IRSTLM using the options -n 3 (2  
 grams), -b
 (include the s sentence boundary) and -d (subdictionary for ngrams).
 Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA  
 format.

 Finally, I ran build_binary to binarize the ARPA format for KenLM.  
 I got
 the following error:

 $ build_binary arpa.en.lm arpa.en.binary
 Reading lm.en.lm
 5---10---15---20---25---30---35---40---45---50---55---60---65---70 
 ---75---80---85---90---95--100
 terminate called after throwing an instance of  
 'lm::FormatLoadException'
   what():  Expected blank line after 3-grams at byte 22348989 in file
 arpa.en.lm
 Aborted

 What am I missing?

 Thanks,
 Tom


 On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield  
 mo...@kheafield.com
 wrote:
 KenLM is inference-only.  It cannot create ARPA files.  So you'll  
 need
 to use your favorite toolkit to generate the ARPA.

 On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
 Thanks Ken. Nice work.

 Is there a way to train the ARPA formatted LM with KenLM, or do  
 we need
 to
 train with another tool, like SRILM or convert IRSTLM to full ARPA
 format?

 Thanks again,
 Tom



 On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 Hi Moses,

Introducing kenlm in Moses trunk.  You no longer need to  
 download a
 separate language model to use Moses; it's distributed with  
 Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI
 takes
 and uses less memory too.  Using kenlm is simple: in your
 [lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic  
 bytes at
 the beginning.

The code is ready for use and provides correct results.   
 Inference is
 slower than it should be due to inefficiencies in the Moses-side
 wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm  
 working
 on it and once this is done I'll post some benchmarks against  
 SRI and
 IRST. The binary format is subject to change, but contains a  
 version
 number so on very rare occasions after, new versions will tell  
 you to
 rebuild your binary files.  Windows is currently not supported (it
 uses
 mmap) though I welcome contributions using #ifdef and
 CreateFileMapping.

Have fun and let me know about your experiences with it.

 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread Kenneth Heafield
I've fixed this in revision 3657 and tested that it works with a toy
IRSTLM example.

Sorry about that,

Kenneth

P.S. a faster version is under code review and coming soon.

On 10/26/10 03:57, Nicola Bertoldi wrote:
 the empty line after each ngram-block is not mandatory in the ARPA format
 (see http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html)
 and IRSTLM does not produce it.
 
 
 best regards,
 Nicola Bertoldi
 
 On Oct 26, 2010, at 9:42 AM, supp...@precisiontranslationtools.com
 supp...@precisiontranslationtools.com wrote:
 
 Hi Ken,

 I'm created an iARPA file with IRSTLM using the options -n 3 (2
 grams), -b
 (include the s sentence boundary) and -d (subdictionary for ngrams).
 Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA
 format.

 Finally, I ran build_binary to binarize the ARPA format for KenLM. I got
 the following error:

 $ build_binary arpa.en.lm arpa.en.binary
 Reading lm.en.lm
 5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

 terminate called after throwing an instance of 'lm::FormatLoadException'
   what():  Expected blank line after 3-grams at byte 22348989 in file
 arpa.en.lm
 Aborted

 What am I missing?

 Thanks,
 Tom


 On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 KenLM is inference-only.  It cannot create ARPA files.  So you'll need
 to use your favorite toolkit to generate the ARPA.

 On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
 Thanks Ken. Nice work.

 Is there a way to train the ARPA formatted LM with KenLM, or do we need
 to
 train with another tool, like SRILM or convert IRSTLM to full ARPA
 format?

 Thanks again,
 Tom



 On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 Hi Moses,

 Introducing kenlm in Moses trunk.  You no longer need to
 download a
 separate language model to use Moses; it's distributed with Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI
 takes
 and uses less memory too.  Using kenlm is simple: in your
 [lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

 For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes at
 the beginning.

 The code is ready for use and provides correct results. 
 Inference is
 slower than it should be due to inefficiencies in the Moses-side
 wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm working
 on it and once this is done I'll post some benchmarks against SRI and
 IRST. The binary format is subject to change, but contains a version
 number so on very rare occasions after, new versions will tell you to
 rebuild your binary files.  Windows is currently not supported (it
 uses
 mmap) though I welcome contributions using #ifdef and
 CreateFileMapping.

 Have fun and let me know about your experiences with it.

 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread support
Thank you, Ken. I'll update my svn revision.

Tom

On Tue, 26 Oct 2010 10:18:17 -0400, Kenneth Heafield mo...@kheafield.com
wrote:
 I've fixed this in revision 3657 and tested that it works with a toy
 IRSTLM example.
 
 Sorry about that,
 
 Kenneth
 
 P.S. a faster version is under code review and coming soon.
 
 On 10/26/10 03:57, Nicola Bertoldi wrote:
 the empty line after each ngram-block is not mandatory in the ARPA
format
 (see
 http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html)
 and IRSTLM does not produce it.
 
 
 best regards,
 Nicola Bertoldi
 
 On Oct 26, 2010, at 9:42 AM, supp...@precisiontranslationtools.com
 supp...@precisiontranslationtools.com wrote:
 
 Hi Ken,

 I'm created an iARPA file with IRSTLM using the options -n 3 (2
 grams), -b
 (include the s sentence boundary) and -d (subdictionary for ngrams).
 Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA
 format.

 Finally, I ran build_binary to binarize the ARPA format for KenLM. I
got
 the following error:

 $ build_binary arpa.en.lm arpa.en.binary
 Reading lm.en.lm

5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

 terminate called after throwing an instance of
'lm::FormatLoadException'
   what():  Expected blank line after 3-grams at byte 22348989 in file
 arpa.en.lm
 Aborted

 What am I missing?

 Thanks,
 Tom


 On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 KenLM is inference-only.  It cannot create ARPA files.  So you'll
need
 to use your favorite toolkit to generate the ARPA.

 On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
 Thanks Ken. Nice work.

 Is there a way to train the ARPA formatted LM with KenLM, or do we
 need
 to
 train with another tool, like SRILM or convert IRSTLM to full ARPA
 format?

 Thanks again,
 Tom



 On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 Hi Moses,

 Introducing kenlm in Moses trunk.  You no longer need to
 download a
 separate language model to use Moses; it's distributed with Moses
and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI
 takes
 and uses less memory too.  Using kenlm is simple: in your
 [lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

 For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes
at
 the beginning.

 The code is ready for use and provides correct results. 
 Inference is
 slower than it should be due to inefficiencies in the Moses-side
 wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm
 working
 on it and once this is done I'll post some benchmarks against SRI
and
 IRST. The binary format is subject to change, but contains a
version
 number so on very rare occasions after, new versions will tell you
to
 rebuild your binary files.  Windows is currently not supported (it
 uses
 mmap) though I welcome contributions using #ifdef and
 CreateFileMapping.

 Have fun and let me know about your experiences with it.

 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread support
Thanks Ken. I tested it and it works. 

FYI, on my first attempt there was a different error. Something about the
s token (word?) was missing. I added the s/s tags and re-ran irstlm's
build-lm.sh script with option -b (Include sentence boundary n-grams) and
the error disappeared.

It's pretty fast now. I look forward to testing the optimized code.

Tom



On Tue, 26 Oct 2010 10:18:17 -0400, Kenneth Heafield mo...@kheafield.com
wrote:
 I've fixed this in revision 3657 and tested that it works with a toy
 IRSTLM example.
 
 Sorry about that,
 
 Kenneth
 
 P.S. a faster version is under code review and coming soon.
 
 On 10/26/10 03:57, Nicola Bertoldi wrote:
 the empty line after each ngram-block is not mandatory in the ARPA
format
 (see
 http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html)
 and IRSTLM does not produce it.
 
 
 best regards,
 Nicola Bertoldi
 
 On Oct 26, 2010, at 9:42 AM, supp...@precisiontranslationtools.com
 supp...@precisiontranslationtools.com wrote:
 
 Hi Ken,

 I'm created an iARPA file with IRSTLM using the options -n 3 (2
 grams), -b
 (include the s sentence boundary) and -d (subdictionary for ngrams).
 Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA
 format.

 Finally, I ran build_binary to binarize the ARPA format for KenLM. I
got
 the following error:

 $ build_binary arpa.en.lm arpa.en.binary
 Reading lm.en.lm

5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

 terminate called after throwing an instance of
'lm::FormatLoadException'
   what():  Expected blank line after 3-grams at byte 22348989 in file
 arpa.en.lm
 Aborted

 What am I missing?

 Thanks,
 Tom


 On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 KenLM is inference-only.  It cannot create ARPA files.  So you'll
need
 to use your favorite toolkit to generate the ARPA.

 On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
 Thanks Ken. Nice work.

 Is there a way to train the ARPA formatted LM with KenLM, or do we
 need
 to
 train with another tool, like SRILM or convert IRSTLM to full ARPA
 format?

 Thanks again,
 Tom



 On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 Hi Moses,

 Introducing kenlm in Moses trunk.  You no longer need to
 download a
 separate language model to use Moses; it's distributed with Moses
and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI
 takes
 and uses less memory too.  Using kenlm is simple: in your
 [lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

 For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes
at
 the beginning.

 The code is ready for use and provides correct results. 
 Inference is
 slower than it should be due to inefficiencies in the Moses-side
 wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm
 working
 on it and once this is done I'll post some benchmarks against SRI
and
 IRST. The binary format is subject to change, but contains a
version
 number so on very rare occasions after, new versions will tell you
to
 rebuild your binary files.  Windows is currently not supported (it
 uses
 mmap) though I welcome contributions using #ifdef and
 CreateFileMapping.

 Have fun and let me know about your experiences with it.

 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread Kenneth Heafield
Yes, I require s and /s to appear in your ARPA.  These tags are
important from an output quality perspective (BLEU etc).  I'll put that
in the documentation when I get around to writing it, but personally
think IRST should include them by default.

Kenneth

On 10/26/10 12:30, supp...@precisiontranslationtools.com wrote:
 Thanks Ken. I tested it and it works. 
 
 FYI, on my first attempt there was a different error. Something about the
 s token (word?) was missing. I added the s/s tags and re-ran irstlm's
 build-lm.sh script with option -b (Include sentence boundary n-grams) and
 the error disappeared.
 
 It's pretty fast now. I look forward to testing the optimized code.
 
 Tom
 
 
 
 On Tue, 26 Oct 2010 10:18:17 -0400, Kenneth Heafield mo...@kheafield.com
 wrote:
 I've fixed this in revision 3657 and tested that it works with a toy
 IRSTLM example.

 Sorry about that,

 Kenneth

 P.S. a faster version is under code review and coming soon.

 On 10/26/10 03:57, Nicola Bertoldi wrote:
 the empty line after each ngram-block is not mandatory in the ARPA
 format
 (see
 http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html)
 and IRSTLM does not produce it.


 best regards,
 Nicola Bertoldi

 On Oct 26, 2010, at 9:42 AM, supp...@precisiontranslationtools.com
 supp...@precisiontranslationtools.com wrote:

 Hi Ken,

 I'm created an iARPA file with IRSTLM using the options -n 3 (2
 grams), -b
 (include the s sentence boundary) and -d (subdictionary for ngrams).
 Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA
 format.

 Finally, I ran build_binary to binarize the ARPA format for KenLM. I
 got
 the following error:

 $ build_binary arpa.en.lm arpa.en.binary
 Reading lm.en.lm

 5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

 terminate called after throwing an instance of
 'lm::FormatLoadException'
   what():  Expected blank line after 3-grams at byte 22348989 in file
 arpa.en.lm
 Aborted

 What am I missing?

 Thanks,
 Tom


 On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 KenLM is inference-only.  It cannot create ARPA files.  So you'll
 need
 to use your favorite toolkit to generate the ARPA.

 On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
 Thanks Ken. Nice work.

 Is there a way to train the ARPA formatted LM with KenLM, or do we
 need
 to
 train with another tool, like SRILM or convert IRSTLM to full ARPA
 format?

 Thanks again,
 Tom



 On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 Hi Moses,

 Introducing kenlm in Moses trunk.  You no longer need to
 download a
 separate language model to use Moses; it's distributed with Moses
 and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI
 takes
 and uses less memory too.  Using kenlm is simple: in your
 [lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

 For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes
 at
 the beginning.

 The code is ready for use and provides correct results. 
 Inference is
 slower than it should be due to inefficiencies in the Moses-side
 wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm
 working
 on it and once this is done I'll post some benchmarks against SRI
 and
 IRST. The binary format is subject to change, but contains a
 version
 number so on very rare occasions after, new versions will tell you
 to
 rebuild your binary files.  Windows is currently not supported (it
 uses
 mmap) though I welcome contributions using #ifdef and
 CreateFileMapping.

 Have fun and let me know about your experiences with it.

 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-22 Thread support
Thanks Ken. Nice work. 

Is there a way to train the ARPA formatted LM with KenLM, or do we need to
train with another tool, like SRILM or convert IRSTLM to full ARPA format?

Thanks again,
Tom



On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield mo...@kheafield.com
wrote:
 Hi Moses,
 
   Introducing kenlm in Moses trunk.  You no longer need to download a
 separate language model to use Moses; it's distributed with Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI takes
 and uses less memory too.  Using kenlm is simple: in your [lmodel-file]
 section, change the first digit to 8.  For example,
 
 0 0 2 foo.arpa changes to 8 0 2 foo.arpa
 
   For even faster loading, use the binary format:
 
 kenlm/build_binary foo.arpa foo.binary
 
 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes at
 the beginning.
 
   The code is ready for use and provides correct results.  Inference is
 slower than it should be due to inefficiencies in the Moses-side wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm working
 on it and once this is done I'll post some benchmarks against SRI and
 IRST. The binary format is subject to change, but contains a version
 number so on very rare occasions after, new versions will tell you to
 rebuild your binary files.  Windows is currently not supported (it uses
 mmap) though I welcome contributions using #ifdef and CreateFileMapping.
 
   Have fun and let me know about your experiences with it.
 
 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-22 Thread Kenneth Heafield
KenLM is inference-only.  It cannot create ARPA files.  So you'll need
to use your favorite toolkit to generate the ARPA.

On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
 Thanks Ken. Nice work. 
 
 Is there a way to train the ARPA formatted LM with KenLM, or do we need to
 train with another tool, like SRILM or convert IRSTLM to full ARPA format?
 
 Thanks again,
 Tom
 
 
 
 On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield mo...@kheafield.com
 wrote:
 Hi Moses,

  Introducing kenlm in Moses trunk.  You no longer need to download a
 separate language model to use Moses; it's distributed with Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI takes
 and uses less memory too.  Using kenlm is simple: in your [lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

  For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes at
 the beginning.

  The code is ready for use and provides correct results.  Inference is
 slower than it should be due to inefficiencies in the Moses-side wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm working
 on it and once this is done I'll post some benchmarks against SRI and
 IRST. The binary format is subject to change, but contains a version
 number so on very rare occasions after, new versions will tell you to
 rebuild your binary files.  Windows is currently not supported (it uses
 mmap) though I welcome contributions using #ifdef and CreateFileMapping.

  Have fun and let me know about your experiences with it.

 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] KenLM distributed with Moses

2010-10-22 Thread support
Thanks, Ken.

Tom

On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield mo...@kheafield.com
wrote:
 KenLM is inference-only.  It cannot create ARPA files.  So you'll need
 to use your favorite toolkit to generate the ARPA.
 
 On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
 Thanks Ken. Nice work. 
 
 Is there a way to train the ARPA formatted LM with KenLM, or do we need
 to
 train with another tool, like SRILM or convert IRSTLM to full ARPA
 format?
 
 Thanks again,
 Tom
 
 
 
 On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield
 mo...@kheafield.com
 wrote:
 Hi Moses,

 Introducing kenlm in Moses trunk.  You no longer need to download a
 separate language model to use Moses; it's distributed with Moses and
 compiled in by default on UNIX.  This is threadsafe language model
 inference code that returns the same probabilities as SRI (up to
 floating point rounding).  It loads APRA files in 2/3 the time SRI
takes
 and uses less memory too.  Using kenlm is simple: in your
[lmodel-file]
 section, change the first digit to 8.  For example,

 0 0 2 foo.arpa changes to 8 0 2 foo.arpa

 For even faster loading, use the binary format:

 kenlm/build_binary foo.arpa foo.binary

 then simply provide the binary filename in your moses.ini e.g.
 8 0 2 foo.binary; it auto detects binary files using magic bytes at
 the beginning.

 The code is ready for use and provides correct results.  Inference is
 slower than it should be due to inefficiencies in the Moses-side
wrapper
 code (it does a vocab lookup for all 5 words every time).  I'm working
 on it and once this is done I'll post some benchmarks against SRI and
 IRST. The binary format is subject to change, but contains a version
 number so on very rare occasions after, new versions will tell you to
 rebuild your binary files.  Windows is currently not supported (it
uses
 mmap) though I welcome contributions using #ifdef and
CreateFileMapping.

 Have fun and let me know about your experiences with it.

 Ken
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] KenLM distributed with Moses

2010-10-18 Thread Christof Pintaske
  Hi,

I saw that KenLM source code is distributed from the Moses svn and can 
set in configure. Is anybody here using it and willing to share some 
experiences? Is it thread-safe and can used in Moses together with SRI 
and IRST ? Any particular advantages? Is there any more information than 
just the README?

any hints are very welcome
Christof


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] KenLM distributed with Moses

2010-10-18 Thread Kenneth Heafield
Hi Moses,

Introducing kenlm in Moses trunk.  You no longer need to download a
separate language model to use Moses; it's distributed with Moses and
compiled in by default on UNIX.  This is threadsafe language model
inference code that returns the same probabilities as SRI (up to
floating point rounding).  It loads APRA files in 2/3 the time SRI takes
and uses less memory too.  Using kenlm is simple: in your [lmodel-file]
section, change the first digit to 8.  For example,

0 0 2 foo.arpa changes to 8 0 2 foo.arpa

For even faster loading, use the binary format:

kenlm/build_binary foo.arpa foo.binary

then simply provide the binary filename in your moses.ini e.g.
8 0 2 foo.binary; it auto detects binary files using magic bytes at
the beginning.

The code is ready for use and provides correct results.  Inference is
slower than it should be due to inefficiencies in the Moses-side wrapper
code (it does a vocab lookup for all 5 words every time).  I'm working
on it and once this is done I'll post some benchmarks against SRI and
IRST. The binary format is subject to change, but contains a version
number so on very rare occasions after, new versions will tell you to
rebuild your binary files.  Windows is currently not supported (it uses
mmap) though I welcome contributions using #ifdef and CreateFileMapping.

Have fun and let me know about your experiences with it.

Ken
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support