Onderwerp: how is calculation of the language model costs performed? Datum: 30 april 2013 14:18:53 GMT+02:00 Aan: //[email protected]
Hi, we are using the Moses decoder to apply normalisation on sloppy text input. We have manually build a phrase table containing different possible normalized versions of out input words, and assigned all equal probabilities to these alternatives, in order to let the language model decide on the best normalized version. As a toy example, we made a phrase table containing three alternatives for the Dutch word "vndg" (abbreviation of "vandaag") vndg vaandag vndg vandaag vndg vndg in the output & logging, we see that the language model cost for the correct normalisation (vandaag) is always higher than for the other two alternatives (vaandag/vndg) that even do not appear in the language model (they are non existing Dutch words and do not appear in the training corpus for the creation of the language model). This seems very strange ... is there some kind of bias to have a lower LM cost for words that do not appear at all in the language model (some kind of smoothening maybe?) ? If this is the case, how can we tune Moses to assign higher probabilities to words that do occur in the language model? Thanks in advance! Els Lefever.
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
