Hi,
This looks like a bug in the trie implementation due to some recent
changes I made for left state minimization. I'll fix it soon. A
workaround is to pass a large -m option to build_binary.
Sorry,
Kenneth
On 10/08/11 11:34, marco turchi wrote:
> Dear All,
> I'm trying to build a lm using a large dataset (> 11 M sentences). I
> have generated the Arpa format with irstlm and now I'd like to
> binarize it using kenlm.
>
> I have called the build_binary to estimate memory usage, and I got this
>
> Memory estimate:
> type MB
> probing 16129 assuming -p 1.5
> trie 7462 without quantization
> trie 4361 assuming -q 8 -b 8 quantization
> trie 6440 assuming -a 22 array pointer compression
> trie 3339 assuming -a 22 -q 8 -b 8 array pointer compression and
> quantization
>
> then I run the binarization in this way:
>
> /nfs/staging/turchmo/moses/kenlmNew/build_binary -i -t /tmp/ -q 8 -b 8
> trie irstLM.ARPA.txt irstLanguageModel.binary.lm
>
> but I got this error:
>
> lm/search_trie.cc:409 in void
> lm::ngram::trie::<unnamed>::SanityCheckCounts(const std::vector<long
> unsigned int, std::allocator<long unsigned int> >&, const
> std::vector<long unsigned int, std::allocator<long unsigned int> >&)
> threw util::Exception'.
> Longest count should be constant but it changed from 289546423 to
> 289546405 Byte: 37297517525
>
> I have had a look into the mailing list, but I do not find any post
> with the same error.
>
> Any ideas?
>
> Thanks a lot
> Marco
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support