Philipp Koehn wrote: > this is not correct - LM cost is in the future cost estimate. > Obviously, this is a rather low probability, depending > on if the language model was trained with open or > closed vocabulary.
And also whether the word is unknown to the LM or not, yes? Typically there are many more words in the language model's vocabulary than in the phrase table. > The reordering of unknown words does cause often some > strange reordering, due to the fact that an unknown word > creates an unknown context for following words, and some > words may prefer more than others to appear in such an > unknown context. These issues suggest to me that there might be some gain in dividing unknown words into a number of different classes. (I don't mean Moses would do this, but that it would be some sort of pre- and post- processing steps that swap real words for a few placeholder tokens.) This could be quite simple (UNK_NUM vs. UNK_ALPHA vs. UNK_MIXED) or a more sophisticated unsupervised statistical model. Has anyone tried anything like this, specifically with Moses systems? Thanks. - John D. Burger MITRE _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
