Dear Colleagues,

The new release of KenLM, a language modeling toolkit, adds estimation with modified Kneser-Ney smoothing from text. Language model estimation is done with streaming on-disk algorithms. The amount of RAM to use is configurable, so it can scale to much larger models without approximation. It is about twice as fast as SRILM's ngram-count.

Querying language models with KenLM uses less RAM and less CPU than a variety of other toolkits. The library can be used directly or inside several decoders e.g. Moses, cdec, Joshua, Ncode, and Kriya.

In both cases, the ARPA format is supported, so the estimation and querying parts are compatible with other toolkits.

Code is distributed under the LGPL (Boost and BSD-like licenses cover some parts). Documentation and downloads can be found at

http://www.kheafield.com/code/kenlm/

Kenneth Heafield (University of Edinburgh & Carnegie Mellon)


_______________________________________________
Mt-list mailing list
[email protected]
http://mailhost.computing.dcu.ie/mailman/listinfo/mt-list

Reply via email to