Re: [Moses-support] KenLM distributed with Moses

2010-11-02 Thread Lee Ball (Applied Language)
Hi, I've not seen it in this list but what licenses is KenLM distributed under? Kind regards, Lee Ball Infrastructure Manager lee.b...@appliedlanguage.com Skype ID: lee.ball_appliedlanguage Tel: +44 (0)844 854 8945 Applied Language Solutions High quality language solutions delivered on

Re: [Moses-support] KenLM distributed with Moses

2010-10-29 Thread support
Ken, Your new enhancements ROCK! Here are some numbers using rev 3675 and IRSTLM 5.50.01 Machine: Core2Quad, 2.4 Ghz, 4 GB RAM Data: EN-NL sample data, 37,500 segments (micro test sample) 3 gram LM, 3 gram tables (for fast testing) Train LM with SRILM Train tables/tune/eval with

Re: [Moses-support] KenLM distributed with Moses

2010-10-29 Thread Kenneth Heafield
Thanks for sharing! Looks like building my Moses system from scratch finally finished, so I'll be making some memory benchmarks today too. Just so I understand, you ran separate MERT for each of your three cases? Then MERT randomness should explain the insignificant difference in BLEU between

Re: [Moses-support] KenLM distributed with Moses

2010-10-29 Thread support
Yes, all scores and times were from scratch without reusing anything. Precision Translation Tools will announce a simpler solution to building a moses system from scratch next week. Essentially, from minimal server configuration to completely installed Moses system in four steps and 30 minute

Re: [Moses-support] KenLM distributed with Moses

2010-10-27 Thread support
Thanks Ken for all your feedback, One more question. I'm using moses with boost. I uncommented the line #define USE_BOOST in kenlm/util/string_piece.hh and recompiled Moses without problems. Then, I uncommented #define USE_ICU and ./configure fails with the error log below. libicu-dev and

Re: [Moses-support] KenLM distributed with Moses

2010-10-27 Thread Kenneth Heafield
Revision 3671 introduces an updated version of kenlm. Queries are faster now (no more string vocab lookups, state is kept so backoffs cost less). The binary format has changed as a result; please rebuild your binary files. Timing is forthcoming. Kenneth On 10/18/10 20:31, Kenneth Heafield

Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread support
Hi Ken, I'm created an iARPA file with IRSTLM using the options -n 3 (2 grams), -b (include the s sentence boundary) and -d (subdictionary for ngrams). Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA format. Finally, I ran build_binary to binarize the ARPA format for KenLM.

Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread Nicola Bertoldi
the empty line after each ngram-block is not mandatory in the ARPA format (see http://www.speech.sri.com/projects/srilm/manpages/ngram-format. 5.html) and IRSTLM does not produce it. best regards, Nicola Bertoldi On Oct 26, 2010, at 9:42 AM, supp...@precisiontranslationtools.com

Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread Kenneth Heafield
I've fixed this in revision 3657 and tested that it works with a toy IRSTLM example. Sorry about that, Kenneth P.S. a faster version is under code review and coming soon. On 10/26/10 03:57, Nicola Bertoldi wrote: the empty line after each ngram-block is not mandatory in the ARPA format (see

Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread support
Thank you, Ken. I'll update my svn revision. Tom On Tue, 26 Oct 2010 10:18:17 -0400, Kenneth Heafield mo...@kheafield.com wrote: I've fixed this in revision 3657 and tested that it works with a toy IRSTLM example. Sorry about that, Kenneth P.S. a faster version is under code review

Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread support
Thanks Ken. I tested it and it works. FYI, on my first attempt there was a different error. Something about the s token (word?) was missing. I added the s/s tags and re-ran irstlm's build-lm.sh script with option -b (Include sentence boundary n-grams) and the error disappeared. It's pretty fast

Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread Kenneth Heafield
Yes, I require s and /s to appear in your ARPA. These tags are important from an output quality perspective (BLEU etc). I'll put that in the documentation when I get around to writing it, but personally think IRST should include them by default. Kenneth On 10/26/10 12:30,

Re: [Moses-support] KenLM distributed with Moses

2010-10-22 Thread support
Thanks Ken. Nice work. Is there a way to train the ARPA formatted LM with KenLM, or do we need to train with another tool, like SRILM or convert IRSTLM to full ARPA format? Thanks again, Tom On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield mo...@kheafield.com wrote: Hi Moses,

Re: [Moses-support] KenLM distributed with Moses

2010-10-22 Thread Kenneth Heafield
KenLM is inference-only. It cannot create ARPA files. So you'll need to use your favorite toolkit to generate the ARPA. On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote: Thanks Ken. Nice work. Is there a way to train the ARPA formatted LM with KenLM, or do we need to train

Re: [Moses-support] KenLM distributed with Moses

2010-10-22 Thread support
Thanks, Ken. Tom On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield mo...@kheafield.com wrote: KenLM is inference-only. It cannot create ARPA files. So you'll need to use your favorite toolkit to generate the ARPA. On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote: Thanks

[Moses-support] KenLM distributed with Moses

2010-10-18 Thread Christof Pintaske
Hi, I saw that KenLM source code is distributed from the Moses svn and can set in configure. Is anybody here using it and willing to share some experiences? Is it thread-safe and can used in Moses together with SRI and IRST ? Any particular advantages? Is there any more information than

[Moses-support] KenLM distributed with Moses

2010-10-18 Thread Kenneth Heafield
Hi Moses, Introducing kenlm in Moses trunk. You no longer need to download a separate language model to use Moses; it's distributed with Moses and compiled in by default on UNIX. This is threadsafe language model inference code that returns the same probabilities as SRI (up to floating