Hi, I've not seen it in this list but what licenses is KenLM distributed
under?
Kind regards,
Lee Ball
Infrastructure Manager
lee.b...@appliedlanguage.com
Skype ID: lee.ball_appliedlanguage
Tel: +44 (0)844 854 8945
Applied Language Solutions
High quality language solutions delivered on
Ken,
Your new enhancements ROCK! Here are some numbers using rev 3675 and
IRSTLM 5.50.01
Machine: Core2Quad, 2.4 Ghz, 4 GB RAM
Data: EN-NL sample data, 37,500 segments (micro test sample)
3 gram LM, 3 gram tables (for fast testing)
Train LM with SRILM
Train tables/tune/eval with
Thanks for sharing! Looks like building my Moses system from scratch
finally finished, so I'll be making some memory benchmarks today too.
Just so I understand, you ran separate MERT for each of your three
cases? Then MERT randomness should explain the insignificant difference
in BLEU between
Yes, all scores and times were from scratch without reusing anything.
Precision Translation Tools will announce a simpler solution to building a
moses system from scratch next week. Essentially, from minimal server
configuration to completely installed Moses system in four steps and 30
minute
Thanks Ken for all your feedback,
One more question. I'm using moses with boost. I uncommented the line
#define USE_BOOST in kenlm/util/string_piece.hh and recompiled Moses
without problems.
Then, I uncommented #define USE_ICU and ./configure fails with the error
log below. libicu-dev and
Revision 3671 introduces an updated version of kenlm. Queries are
faster now (no more string vocab lookups, state is kept so backoffs cost
less). The binary format has changed as a result; please rebuild your
binary files. Timing is forthcoming.
Kenneth
On 10/18/10 20:31, Kenneth Heafield
Hi Ken,
I'm created an iARPA file with IRSTLM using the options -n 3 (2 grams), -b
(include the s sentence boundary) and -d (subdictionary for ngrams).
Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA format.
Finally, I ran build_binary to binarize the ARPA format for KenLM.
the empty line after each ngram-block is not mandatory in the ARPA
format
(see http://www.speech.sri.com/projects/srilm/manpages/ngram-format.
5.html)
and IRSTLM does not produce it.
best regards,
Nicola Bertoldi
On Oct 26, 2010, at 9:42 AM, supp...@precisiontranslationtools.com
I've fixed this in revision 3657 and tested that it works with a toy
IRSTLM example.
Sorry about that,
Kenneth
P.S. a faster version is under code review and coming soon.
On 10/26/10 03:57, Nicola Bertoldi wrote:
the empty line after each ngram-block is not mandatory in the ARPA format
(see
Thank you, Ken. I'll update my svn revision.
Tom
On Tue, 26 Oct 2010 10:18:17 -0400, Kenneth Heafield mo...@kheafield.com
wrote:
I've fixed this in revision 3657 and tested that it works with a toy
IRSTLM example.
Sorry about that,
Kenneth
P.S. a faster version is under code review
Thanks Ken. I tested it and it works.
FYI, on my first attempt there was a different error. Something about the
s token (word?) was missing. I added the s/s tags and re-ran irstlm's
build-lm.sh script with option -b (Include sentence boundary n-grams) and
the error disappeared.
It's pretty fast
Yes, I require s and /s to appear in your ARPA. These tags are
important from an output quality perspective (BLEU etc). I'll put that
in the documentation when I get around to writing it, but personally
think IRST should include them by default.
Kenneth
On 10/26/10 12:30,
Thanks Ken. Nice work.
Is there a way to train the ARPA formatted LM with KenLM, or do we need to
train with another tool, like SRILM or convert IRSTLM to full ARPA format?
Thanks again,
Tom
On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield mo...@kheafield.com
wrote:
Hi Moses,
KenLM is inference-only. It cannot create ARPA files. So you'll need
to use your favorite toolkit to generate the ARPA.
On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
Thanks Ken. Nice work.
Is there a way to train the ARPA formatted LM with KenLM, or do we need to
train
Thanks, Ken.
Tom
On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield mo...@kheafield.com
wrote:
KenLM is inference-only. It cannot create ARPA files. So you'll need
to use your favorite toolkit to generate the ARPA.
On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote:
Thanks
Hi,
I saw that KenLM source code is distributed from the Moses svn and can
set in configure. Is anybody here using it and willing to share some
experiences? Is it thread-safe and can used in Moses together with SRI
and IRST ? Any particular advantages? Is there any more information than
Hi Moses,
Introducing kenlm in Moses trunk. You no longer need to download a
separate language model to use Moses; it's distributed with Moses and
compiled in by default on UNIX. This is threadsafe language model
inference code that returns the same probabilities as SRI (up to
floating
17 matches
Mail list logo