Philipp Koehn <pkoehn@...> writes:

> 
> Hi,
> 
> can I ask a dumb question -
> where do these unknown words come from?
> 
> Obviously there are words that are unknown in the source,
> hence placed verbatim in the output, which will be likely
> be unknown to the language model. But there is really not
> much choice about having them or not (besides -drop-unknown).
> All translations will have them.
> 
> Otherwise, all words in the translation model should be known.
> 
> So, what is the choice here?
> 
> -phi
> 

Hi Philipp,

I can give you another instance where <unk> matters. I played around with
integrating external knowledge through additional translation models, along the
lines of 

Chen et al. (2007). Multi-Engine Machine Translation with an Open-Source Decoder
for Statistical Machine Translation. WMT 2007.

With this approach, the translation model(s) *do* produce words unknown to the
language model, and the probability of <unk> has quite a big effect. (in one
experiment, setting <unk> artificially low
(-100) produced better results (by about 0.5 BLEU percentage points) than just
passing the "-unk" parameter to SRILM. 

best,
Rico

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to