kenlm now supports quantization.  To use it, svn up then run
build_binary with -q:

kenlm/build_binary -q 8 trie foo.arpa foo.out

for 8 bits.  You can choose from 2 to 25 bits, inclusive.  Currently,
probability and backoff are quantized separately (in this case using 8
bits each).  By default, -q applies to both probability and backoff.
You can use -b to set the number of bits to use for backoff independently.

As always, you can get a memory estimate by omitting the output file e.g.

kenlm/build_binary -q 8 trie foo.arpa

There are 2^bits - 1 probability values (one is reserved for blanks when
SRI prunes where it shouldn't) and 2^bits - 2 non-zero backoff values
(reserved values indicate zero backoff for n-grams that extend or don't
extend to the right).  Because these reserved values make the number of
bins not a power of two, it's hard to support qARPA.  IRSTLM doesn't
optimize when a context is known not to extend, so they don't need two
reserved backoff values.

Currently using a simple reimplementation of IRSTLM's binning method.
M. Federico and N. Bertoldi. 2006. How many bits are needed to store
probabilities for phrase-based translation? In Proc. of the Workshop on
Statistical Machine Translation, pages 94–101, New York City, June.
Association for Computational Linguistics.  Plugging in other
quantization methods should be relatively simple now.  Haven't done
quality evaluation yet.

It only works with trie.  If you're quantizing, you're probably worried
about memory, so

Kenneth
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to