If you use the -report-all-factors options, you can identify unknown words by their UNK factor. So, you can pipe moses to an external program which will transform them.
-- Raphael Payen -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Suzy Howlett Sent: 22 April 2010 07:16 To: [email protected] Subject: [Moses-support] idea for handling unknown words Hi, I was thinking about how unknown words are handled in Moses - namely being copied verbatim to the output or being dropped completely - and I had an idea. Would it be worth allowing the user to specify (through a command line argument) an external program that could be called on an unknown word to generate a new output? For example, if you were translating into English from a language that doesn't use a roman script, perhaps you would want to call an external program that would take the foreign word and produce a romanised version. This might help if the word is a name or a borrowing that wasn't seen in the parallel corpus, but the romanised version was seen in the corpus used to train the language model. Another example might be you have a lexicon that you could get a translation from as a last resort, and the external program consults that lexicon to return a translation. Does anyone think this would be a worthwhile addition? (Unfortunately I haven't yet been able to work out how to do it... suggestions appreciated.) Suzy _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
