Hi Folks,

Is there some way to penalize LM-OOVs when using Moses+KenLM? I saw a
suggestion to create an open-vocab LM (I usually use closed-vocab) but
I think this means that in some context a LM-OOV could be produced in
preference to a non LM-OOV. This should not be the case in standard
phrase-based SMT (e.g., using the feature functions used in the Moses
baseline for the shared task for instance). Instead, Moses should
produce the minimal number of LM-OOVs possible.

There are exceptions to this when using different feature functions.
For instance, we have a paper on trading off transliteration vs
semantic translation (for Hindi to Urdu translation), where the
transliterations are sometimes LM-OOV, but still a better choice than
available semantic translations (which are not LM-OOV). But the
overall SMT models we used supports this specific trade-off (and it
took work to make the models do this correctly, this is described in
the paper).

I believe for the other three LM packages used with Moses the minimal
number of LM-OOVs is always produced. I've switched back to
Moses+SRILM for now due to this issue. I think it may be the case that
Moses+KenLM actually produces the maximal number of OOVs allowed by
the phrases loaded, which would be highly undesirable. Empirically, it
certainly produces more than Moses+SRILM in my experiments.

Thanks and Cheers, Alex
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to