Hi,
I'd like to know what kind of pruning you used.
The basic problem is that the probing data structure allocates space
based on counts at the beginning of the ARPA file. When it detects some
missing entries (violations of the assumption that every substring of an
n-gram also appears), it reinserts these entries. That takes up some of
the free slots in the pre-sized hash table. But usually 1% more, not
>50% more as seems to be the case here.
tl;dr: pass -p 2.0 to build_binary. If that doesn't work, try a higher
value. Or turn off pruning.
Kenneth
On 08/08/13 15:59, Jean D'Ennris wrote:
>
> Dear All,
>
> I am having some troubles with the the binarisation of a POS language model
>
> It was successfully compiled into an arpa file.
>
> As a run the command :
>
> /home/Moses/mosesdecoder/bin/build_binary -i lm.pos.arpa lm.pos.blm
>
> I'm getting the following message :
>
>
> lm/search_hashed.cc:276 in void
> lm::ngram::detail::HashedSearch<Value>::ApplyBuild(util::FilePiece&,
> const std::vector<long unsigned int, std::allocator<long unsigned int>
>>&, const lm::ngram::Config&, const lm::ngram::ProbingVocabulary&,
> lm::PositiveProbWarn&, const Build&) [with Build =
> lm::ngram::NoRestBuild, Value = lm::ngram::BackoffValue] threw
> util::ProbingSizeException'.
> Avoid pruning n-grams like "bar baz quux" when "foo bar baz quux" is
> still in the model. KenLM will work when this pruning happens, but the
> probing model assumes these events are rare enough that using blank
> space in the probing hash table will cover all of them. Increase
> probing_multiplier (-p to build_binary) to add more blank spaces.
> Byte: 16380 File: lm.pos.arpa
> ERROR
>
>
> Any help will be appreciated
>
> Thank you
>
> Kind Regards
>
> Jean E.
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support