Dear Colleagues,
The new release of KenLM, a language modeling toolkit, adds estimation
with modified Kneser-Ney smoothing from text. Language model estimation
is done with streaming on-disk algorithms. The amount of RAM to use is
configurable, so it can scale to much larger models without
approximation. It is about twice as fast as SRILM's ngram-count.
Querying language models with KenLM uses less RAM and less CPU than a
variety of other toolkits. The library can be used directly or inside
several decoders e.g. Moses, cdec, Joshua, Ncode, and Kriya.
In both cases, the ARPA format is supported, so the estimation and
querying parts are compatible with other toolkits.
Code is distributed under the LGPL (Boost and BSD-like licenses cover
some parts). Documentation and downloads can be found at
http://www.kheafield.com/code/kenlm/
Kenneth Heafield (University of Edinburgh & Carnegie Mellon)
_______________________________________________
Mt-list mailing list
[email protected]
http://mailhost.computing.dcu.ie/mailman/listinfo/mt-list