>> I allow pass through of all words, with a penalty that is also learned
>> by MERT.
> Interesting stuff. Do you have results published on this?
This was easiest to implement when I wrote cdec, and the results
seemed good enough, so I never did a proper comparison. I will
describe the newer innovation to give OOVs a tunable penalty in this
year's WMT system description.

Also, for completeness on this topic: I have a third tunable feature
that counts the number of non-ASCII characters in the target. So far,
I've only used this when translating into English from Chinese or
Arabic.

>
>> With the open-class LM, I use the -unk option in SRILM,
>> which reserves a bit of probability mass for OOVs. What exactly it
>> does is a bit unclear to me (it's more than just replacing singletons
>> with <unk>, but that's probably a reasonable approximation).
>
> I would assume that it does the usual discounting (GoodTuring
> or Kneser Ney), and gives the discounted probability mass to
> <unk>.
>
> -phi
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to