Hi,
Interesting. The only other person to run into this is David Chiang
who had some custom software to prune/build models.
I have been requiring that property to make right state minimization
work correctly: if it doesn't match "to support them" then the right
state contains at most "support them", rendering "to support them ."
inaccessible. I could reinsert "to support them" when this happens,
with p(to support them) = b(to support)p(support them) and b(to support
them) = 0.
It's a bit of a pain to do this correctly. Would you be happy if only
the default probing model supported it, but the trie continued to throw
an error message?
The ARPA standard, to the extent that there is one, does not require
this behavior, so IRSTLM is within their rights to prune them.
Nicola, how does IRSTLM handle these cases at inference time?
Kenneth
On 02/16/2012 07:59 AM, Sylvain Raybaud wrote:
> Hi
>
> LM stuff again!
>
> I've created a language model with IRSTLM (release 5.70.04):
> tlm -tr=toy.sent_start_end.en -lm=msb -n=5 -o=toy.en.n5.lm
>
> When I specify type 1 (IRSTLM) in moses.ini it's loading fine. But if I
> try to load it with KenLM I get:
>
> The context of every 4-gram should appear as a 3-gram Byte: 471440 File:
> /global/markov/raybauds/DATA/TOY/toy.en.n5.lm
>
> Byte 471440 seems to be the '\n' between the following lines:
> -1.16894 to support them . -0.0679314
> -0.836008 to deal with hamas
>
> As a matter of fact, "to support them" does not appear as a trigram in
> the model. If I remove this 4-gram the same problem arises with another
> one, whose 3-gram prefix is also missing. I think it is the problem. If
> I change the smoothing method to "sb" instead of "msb" I get a usable
> LM. Is this normal behavior? Do you think it's a KenLM or an IRSTLM
> related problem?
>
>
> cheers,
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support