> In either case, if Alexander included the parallel training data in LM
> data, he should not be seeing more or less <unk> using SRI or KenLM as
> they currently are.  The <unk> penalty should only impact relative
> ranking but KenLM's inclusion of backoff at <unk> should cause better
> hypotheses on average.

That would be correct if pass-through never competes with translation
in Moses. I think (but am not certain) that pass-through does compete
with translation whenever there is a source word that can not be
covered by a single word phrase but can be covered by a multiple word
phrase (this is what I meant when I was talking about producing the
minimal/maximal number of LM-OOVs given the loaded phrases). However,
this is not how I noticed the original problem with Moses+KenLM
logprob = 0, there I actually did not include all of the parallel data
in the LM, which I wouldn't normally recommend doing.

Cheers, Alex

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to