[Moses-support] idea for handling unknown words

Suzy Howlett Wed, 21 Apr 2010 23:16:36 -0700

Hi,

I was thinking about how unknown words are handled in Moses - namely  
being copied verbatim to the output or being dropped completely - and  
I had an idea. Would it be worth allowing the user to specify (through  
a command line argument) an external program that could be called on  
an unknown word to generate a new output? For example, if you were  
translating into English from a language that doesn't use a roman  
script, perhaps you would want to call an external program that would  
take the foreign word and produce a romanised version. This might help  
if the word is a name or a borrowing that wasn't seen in the parallel  
corpus, but the romanised version was seen in the corpus used to train  
the language model. Another example might be you have a lexicon that  
you could get a translation from as a last resort, and the external  
program consults that lexicon to return a translation.


Does anyone think this would be a worthwhile addition? (Unfortunately  
I haven't yet been able to work out how to do it... suggestions  
appreciated.)

Suzy
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] idea for handling unknown words

Reply via email to