A common way is to identify these OOV words before decoding and to send them, for example, to a transliteration module if that's appropriate. After that process you can use XML mark-up to force the decoder to use those transliterations. About the lexicon: the best way is to use it for training in the first place.
Jörg Suzy Howlett wrote: > Hi, > > I was thinking about how unknown words are handled in Moses - namely > being copied verbatim to the output or being dropped completely - and > I had an idea. Would it be worth allowing the user to specify (through > a command line argument) an external program that could be called on > an unknown word to generate a new output? For example, if you were > translating into English from a language that doesn't use a roman > script, perhaps you would want to call an external program that would > take the foreign word and produce a romanised version. This might help > if the word is a name or a borrowing that wasn't seen in the parallel > corpus, but the romanised version was seen in the corpus used to train > the language model. Another example might be you have a lexicon that > you could get a translation from as a last resort, and the external > program consults that lexicon to return a translation. > > Does anyone think this would be a worthwhile addition? (Unfortunately > I haven't yet been able to work out how to do it... suggestions > appreciated.) > > Suzy > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support -- *******/\/\/\/\/\/\/\/\/\/\/\****************************************** Jörg Tiedemann [email protected] Dep. of Linguistics and Philology http://stp.lingfil.uu.se/~joerg/ Uppsala University tel: +46 (0)18 - 471 1412 Box 635, SE-751 26 Uppsala/SWEDEN fax: +46 (0)18 - 471 1094 *********************************/\/\/\/\/\/\/\/\/\/\/\**************** _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
