Hi, I was thinking about how unknown words are handled in Moses - namely being copied verbatim to the output or being dropped completely - and I had an idea. Would it be worth allowing the user to specify (through a command line argument) an external program that could be called on an unknown word to generate a new output? For example, if you were translating into English from a language that doesn't use a roman script, perhaps you would want to call an external program that would take the foreign word and produce a romanised version. This might help if the word is a name or a borrowing that wasn't seen in the parallel corpus, but the romanised version was seen in the corpus used to train the language model. Another example might be you have a lexicon that you could get a translation from as a last resort, and the external program consults that lexicon to return a translation.
Does anyone think this would be a worthwhile addition? (Unfortunately I haven't yet been able to work out how to do it... suggestions appreciated.) Suzy _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
