Hi Kenneth, You mean log probability value = 0, I think? Probability = 0 in Moses-land is often redefined to mean a floored log probability value of -20 or -40 or whatever; this is what I thought the error message was referring to. Yes, that is what is causing the problem.
Cheers, Alex On Sat, Mar 19, 2011 at 6:25 PM, Kenneth Heafield <[email protected]> wrote: > The original behavior was to refuse to load any model without <unk>. > Early on, Hieu asked me to change that. The default is now to > substitute probability 0.0 and print this complaint stderr: > > The ARPA file is missing <unk>. Substituting probability 0.0. > > SRI's ngram scoring tool skips OOVs, so a probability of 0.0 reproduces > that behavior (though I still charge the backoff penalty from preceding > words). I'm still not happy with it. > > Documentation like http://statmt.org/wmt11/baseline.html carries > influence. Can you add -unk? > > On 03/19/11 13:07, Philipp Koehn wrote: >> Hi, >> >> I have recently build all my language models with the "-unk" flag, >> so it creates probability mass for unseen words (there is a line >> for <unk> in the language model file). >> >> But I am actually not sure if the SRILM interface properly uses >> this probability. It may just fall back to a very low floor. >> So it may be that Alex's desired feature is just a bug, which can >> be reproduced with kenlm by not training with "-unk", hence >> also falling back to the floor probability (if that is what kenlm >> is doing). >> >> -phi >> >> On Sat, Mar 19, 2011 at 4:59 PM, Kenneth Heafield <[email protected]> >> wrote: >>> I believe the right answer to this is adding an OOV count feature to >>> Moses. In fact, I've gone through and made all the language models >>> return a struct indicating if the word just scored was OOV. However, >>> this needs to make in into the phrases and ultimately the features. >>> Also, there's the fun of adding a config option to moses.ini. Thoughts >>> on default behavior? >>> >>> You can control the unknown word probability by passing -u probability >>> to build_binary. Set that to something negative. It will only be >>> effective if the ARPA file was trained without <unk>. >>> >>> Also, is there are evidence out there for or against passing -unk to >>> SRILM? >>> >>> Kenneth >>> >>> On 03/19/11 12:51, Alexander Fraser wrote: >>>> Hi Folks, >>>> >>>> Is there some way to penalize LM-OOVs when using Moses+KenLM? I saw a >>>> suggestion to create an open-vocab LM (I usually use closed-vocab) but >>>> I think this means that in some context a LM-OOV could be produced in >>>> preference to a non LM-OOV. This should not be the case in standard >>>> phrase-based SMT (e.g., using the feature functions used in the Moses >>>> baseline for the shared task for instance). Instead, Moses should >>>> produce the minimal number of LM-OOVs possible. >>>> >>>> There are exceptions to this when using different feature functions. >>>> For instance, we have a paper on trading off transliteration vs >>>> semantic translation (for Hindi to Urdu translation), where the >>>> transliterations are sometimes LM-OOV, but still a better choice than >>>> available semantic translations (which are not LM-OOV). But the >>>> overall SMT models we used supports this specific trade-off (and it >>>> took work to make the models do this correctly, this is described in >>>> the paper). >>>> >>>> I believe for the other three LM packages used with Moses the minimal >>>> number of LM-OOVs is always produced. I've switched back to >>>> Moses+SRILM for now due to this issue. I think it may be the case that >>>> Moses+KenLM actually produces the maximal number of OOVs allowed by >>>> the phrases loaded, which would be highly undesirable. Empirically, it >>>> certainly produces more than Moses+SRILM in my experiments. >>>> >>>> Thanks and Cheers, Alex >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
