I believe the right answer to this is adding an OOV count feature to Moses. In fact, I've gone through and made all the language models return a struct indicating if the word just scored was OOV. However, this needs to make in into the phrases and ultimately the features. Also, there's the fun of adding a config option to moses.ini. Thoughts on default behavior?
You can control the unknown word probability by passing -u probability to build_binary. Set that to something negative. It will only be effective if the ARPA file was trained without <unk>. Also, is there are evidence out there for or against passing -unk to SRILM? Kenneth On 03/19/11 12:51, Alexander Fraser wrote: > Hi Folks, > > Is there some way to penalize LM-OOVs when using Moses+KenLM? I saw a > suggestion to create an open-vocab LM (I usually use closed-vocab) but > I think this means that in some context a LM-OOV could be produced in > preference to a non LM-OOV. This should not be the case in standard > phrase-based SMT (e.g., using the feature functions used in the Moses > baseline for the shared task for instance). Instead, Moses should > produce the minimal number of LM-OOVs possible. > > There are exceptions to this when using different feature functions. > For instance, we have a paper on trading off transliteration vs > semantic translation (for Hindi to Urdu translation), where the > transliterations are sometimes LM-OOV, but still a better choice than > available semantic translations (which are not LM-OOV). But the > overall SMT models we used supports this specific trade-off (and it > took work to make the models do this correctly, this is described in > the paper). > > I believe for the other three LM packages used with Moses the minimal > number of LM-OOVs is always produced. I've switched back to > Moses+SRILM for now due to this issue. I think it may be the case that > Moses+KenLM actually produces the maximal number of OOVs allowed by > the phrases loaded, which would be highly undesirable. Empirically, it > certainly produces more than Moses+SRILM in my experiments. > > Thanks and Cheers, Alex > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
