Hi,

        This is hopefully a stupid question.  Did you turn on pruning?  I don't 
see it in the command line: "tlm -tr=toy.sent_start_end.en -lm=msb -n=5 
-o=toy.en.n5.lm".  Or did IRSTLM make pruning the default in new releases?

        KenLM should be accepting pruned models and I take responsibility for 
that.  But I am also confused as to how "to support them" did not appear 
if pruning was off.

Kenneth

On 02/16/2012 10:16 AM, Kenneth Heafield wrote:
> Hi,
>
>       Interesting.  The only other person to run into this is David Chiang
> who had some custom software to prune/build models.
>
>       I have been requiring that property to make right state minimization
> work correctly: if it doesn't match "to support them" then the right
> state contains at most "support them", rendering "to support them ."
> inaccessible.  I could reinsert "to support them" when this happens,
> with p(to support them) = b(to support)p(support them) and b(to support
> them) = 0.
>
>       It's a bit of a pain to do this correctly.  Would you be happy if only
> the default probing model supported it, but the trie continued to throw
> an error message?
>
>       The ARPA standard, to the extent that there is one, does not require
> this behavior, so IRSTLM is within their rights to prune them.
>
>       Nicola, how does IRSTLM handle these cases at inference time?
>
> Kenneth
>
> On 02/16/2012 07:59 AM, Sylvain Raybaud wrote:
>> Hi
>>
>>     LM stuff again!
>>
>> I've created a language model with IRSTLM (release 5.70.04):
>> tlm -tr=toy.sent_start_end.en -lm=msb -n=5 -o=toy.en.n5.lm
>>
>> When I specify type 1 (IRSTLM) in moses.ini it's loading fine. But if I
>> try to load it with KenLM I get:
>>
>> The context of every 4-gram should appear as a 3-gram Byte: 471440 File:
>> /global/markov/raybauds/DATA/TOY/toy.en.n5.lm
>>
>> Byte 471440 seems to be the '\n' between the following lines:
>> -1.16894        to support them .       -0.0679314
>> -0.836008       to deal with hamas
>>
>> As a matter of fact, "to support them" does not appear as a trigram in
>> the model. If I remove this 4-gram the same problem arises with another
>> one, whose 3-gram prefix is also missing. I think it is the problem. If
>> I change the smoothing method to "sb" instead of "msb" I get a usable
>> LM. Is this normal behavior? Do you think it's a KenLM or an IRSTLM
>> related problem?
>>
>>
>> cheers,
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to