Hi Moses,
If trie uses too much memory, svn up to revision >= 4074 then pass "-a
#bits" to build_binary. It will minimize memory usage subject to the
maximum number of bits you specify (so e.g. pass bits 40 to minimize
memory usage). Compressing in this manner is lossless, but takes
additional CPU (specifying a small number of bits like 8 will cost less
CPU). You can use it with or without quantization.
Here's some Moses runtimes (1 thread):
Probing 72 minutes, 7.71 GB RAM
Trie 80 minutes, 5.24 GB RAM
Trie -a 32 105 minutes, 4.84 GB RAM
Trie -q 8 -a 32 103 minutes, 3.71 GB RAM (8-bit quantized)
RandLM Backoff with 8 bit quantization and 2^-8 false positive takes
277.9 minutes, 4.18 GB RAM. It also has a lower BLEU score due to the
false positives. RandLM Stupid is still better at memory. This also
means KenLM now wins (by having a version simultaneously faster,
smaller, and higher quality) against SRI, IRST, Berkeley, MIT, and Rand
(except Stupid).
I implemented a simple version of \newcite{compression} that describes
how to compress sorted arrays of integers, saving memory at the
expensive of time (quality is unchanged). This compresses trie pointers
but not word indices (word indices are harder).
My simpler (read: less space efficient but more time efficient) version
chops off the first $b$ bits from every trie pointer. Then a secondary
table stores $\leq 2^b$ offsets, one for each value of the first $b$
bits. Reading a pointer consists of reading the in-line bits, doing
binary search for the pointer's offset in the secondary table, and
noticing that the index in the secondary table is the same as the first
$b$ bits that were removed. The paper further observed that the
secondary table is itself sorted and applied the trick recursively, but
I haven't gone that far yet.
Kenneth
@inproceedings{compression,
author={Bhiksha Raj and Ed Whittaker},
year={2003},
title={Lossless Compression of Language Model Structure and Word
Identifiers},
booktitle={Proceedings of IEEE International Conference on Acoustics,
Speech and Signal Processing},
pages={388--391},
}
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support