[Moses-support] Character-by-character translation model

Tom Hoar Wed, 13 May 2009 11:10:33 -0700

I've used Moses to build a letter-to-sound rules "black box" for use in a
text-to-speech and ASR systems. The translation model is trained from phrase
pairs of orthography (typical words) as the source data and phonetic symbols
as the target data. In my custom tokenization script, I replace all spaces
(word boundaries) with a reserved character and then inserted spaces between
each character such that each character in the orthograph and phonetic
spellings is one token.


I've build a working translation model using the standard
train-factored-phrase-model.perl script. My bleu scores are in the .97 range
but I believe this is irrelavent. In actual use, the translations are great.


I'm looking to reduce the training times and speed the translations during
runtime. In this data the "word" (token) order never changes between source
and target. So, would it make more sense to use the recaser script or
possibly even another script that eliminates both reordering and language
model processing, if such a script exists?

Can you point me to an appropriate training/tuning scripts and training
paramaters that would be appropriate for this use?

Thanks.
Tom

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Character-by-character translation model

Reply via email to