Yes log probability 0.0. I should fix the error message. . .
On 03/19/11 14:16, Alexander Fraser wrote: > Hi Kenneth, > > You mean log probability value = 0, I think? Probability = 0 in > Moses-land is often redefined to mean a floored log probability value > of -20 or -40 or whatever; this is what I thought the error message > was referring to. Yes, that is what is causing the problem. > > Cheers, Alex > > > On Sat, Mar 19, 2011 at 6:25 PM, Kenneth Heafield <[email protected]> wrote: >> The original behavior was to refuse to load any model without <unk>. >> Early on, Hieu asked me to change that. The default is now to >> substitute probability 0.0 and print this complaint stderr: >> >> The ARPA file is missing <unk>. Substituting probability 0.0. >> >> SRI's ngram scoring tool skips OOVs, so a probability of 0.0 reproduces >> that behavior (though I still charge the backoff penalty from preceding >> words). I'm still not happy with it. >> >> Documentation like http://statmt.org/wmt11/baseline.html carries >> influence. Can you add -unk? >> >> On 03/19/11 13:07, Philipp Koehn wrote: >>> Hi, >>> >>> I have recently build all my language models with the "-unk" flag, >>> so it creates probability mass for unseen words (there is a line >>> for <unk> in the language model file). >>> >>> But I am actually not sure if the SRILM interface properly uses >>> this probability. It may just fall back to a very low floor. >>> So it may be that Alex's desired feature is just a bug, which can >>> be reproduced with kenlm by not training with "-unk", hence >>> also falling back to the floor probability (if that is what kenlm >>> is doing). >>> >>> -phi >>> >>> On Sat, Mar 19, 2011 at 4:59 PM, Kenneth Heafield <[email protected]> >>> wrote: >>>> I believe the right answer to this is adding an OOV count feature to >>>> Moses. In fact, I've gone through and made all the language models >>>> return a struct indicating if the word just scored was OOV. However, >>>> this needs to make in into the phrases and ultimately the features. >>>> Also, there's the fun of adding a config option to moses.ini. Thoughts >>>> on default behavior? >>>> >>>> You can control the unknown word probability by passing -u probability >>>> to build_binary. Set that to something negative. It will only be >>>> effective if the ARPA file was trained without <unk>. >>>> >>>> Also, is there are evidence out there for or against passing -unk to >>>> SRILM? >>>> >>>> Kenneth >>>> >>>> On 03/19/11 12:51, Alexander Fraser wrote: >>>>> Hi Folks, >>>>> >>>>> Is there some way to penalize LM-OOVs when using Moses+KenLM? I saw a >>>>> suggestion to create an open-vocab LM (I usually use closed-vocab) but >>>>> I think this means that in some context a LM-OOV could be produced in >>>>> preference to a non LM-OOV. This should not be the case in standard >>>>> phrase-based SMT (e.g., using the feature functions used in the Moses >>>>> baseline for the shared task for instance). Instead, Moses should >>>>> produce the minimal number of LM-OOVs possible. >>>>> >>>>> There are exceptions to this when using different feature functions. >>>>> For instance, we have a paper on trading off transliteration vs >>>>> semantic translation (for Hindi to Urdu translation), where the >>>>> transliterations are sometimes LM-OOV, but still a better choice than >>>>> available semantic translations (which are not LM-OOV). But the >>>>> overall SMT models we used supports this specific trade-off (and it >>>>> took work to make the models do this correctly, this is described in >>>>> the paper). >>>>> >>>>> I believe for the other three LM packages used with Moses the minimal >>>>> number of LM-OOVs is always produced. I've switched back to >>>>> Moses+SRILM for now due to this issue. I think it may be the case that >>>>> Moses+KenLM actually produces the maximal number of OOVs allowed by >>>>> the phrases loaded, which would be highly undesirable. Empirically, it >>>>> certainly produces more than Moses+SRILM in my experiments. >>>>> >>>>> Thanks and Cheers, Alex >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> [email protected] >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
