Hi Folks, Is there some way to penalize LM-OOVs when using Moses+KenLM? I saw a suggestion to create an open-vocab LM (I usually use closed-vocab) but I think this means that in some context a LM-OOV could be produced in preference to a non LM-OOV. This should not be the case in standard phrase-based SMT (e.g., using the feature functions used in the Moses baseline for the shared task for instance). Instead, Moses should produce the minimal number of LM-OOVs possible.
There are exceptions to this when using different feature functions. For instance, we have a paper on trading off transliteration vs semantic translation (for Hindi to Urdu translation), where the transliterations are sometimes LM-OOV, but still a better choice than available semantic translations (which are not LM-OOV). But the overall SMT models we used supports this specific trade-off (and it took work to make the models do this correctly, this is described in the paper). I believe for the other three LM packages used with Moses the minimal number of LM-OOVs is always produced. I've switched back to Moses+SRILM for now due to this issue. I think it may be the case that Moses+KenLM actually produces the maximal number of OOVs allowed by the phrases loaded, which would be highly undesirable. Empirically, it certainly produces more than Moses+SRILM in my experiments. Thanks and Cheers, Alex _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
