[Mt-list] KenLM: Efficient language model estimation and queries

Kenneth Heafield Sat, 26 Jan 2013 04:36:12 -0800

Dear Colleagues,

The new release of KenLM, a language modeling toolkit, adds estimationwith modified Kneser-Ney smoothing from text. Language model estimationis done with streaming on-disk algorithms. The amount of RAM to use isconfigurable, so it can scale to much larger models withoutapproximation. It is about twice as fast as SRILM's ngram-count.

Querying language models with KenLM uses less RAM and less CPU than avariety of other toolkits. The library can be used directly or insideseveral decoders e.g. Moses, cdec, Joshua, Ncode, and Kriya.

In both cases, the ARPA format is supported, so the estimation andquerying parts are compatible with other toolkits.

Code is distributed under the LGPL (Boost and BSD-like licenses coversome parts). Documentation and downloads can be found at


http://www.kheafield.com/code/kenlm/

Kenneth Heafield (University of Edinburgh & Carnegie Mellon)


_______________________________________________
Mt-list mailing list
[email protected]
http://mailhost.computing.dcu.ie/mailman/listinfo/mt-list

[Mt-list] KenLM: Efficient language model estimation and queries

Reply via email to