Re: [Moses-support] Handling unknown words in Moses

John Burger Mon, 09 Aug 2010 06:41:34 -0700

Philipp Koehn wrote:

> this is not correct - LM cost is in the future cost estimate.
> Obviously, this is a rather low probability, depending
> on if the language model was trained with open or
> closed vocabulary.


And also whether the word is unknown to the LM or not, yes?  Typically  
there are many more words in the language model's vocabulary than in  
the phrase table.

> The reordering of unknown words does cause often some
> strange reordering, due to the fact that an unknown word
> creates an unknown context for following words, and some
> words may prefer more than others to appear in such an
> unknown context.

These issues suggest to me that there might be some gain in dividing  
unknown words into a number of different classes.  (I don't mean Moses  
would do this, but that it would be some sort of pre- and post- 
processing steps that swap real words for a few placeholder tokens.)   
This could be quite simple (UNK_NUM vs. UNK_ALPHA vs. UNK_MIXED) or a  
more sophisticated unsupervised statistical model.

Has anyone tried anything like this, specifically with Moses systems?

Thanks.

- John D. Burger
   MITRE

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Handling unknown words in Moses

Reply via email to