I've used Moses to build a letter-to-sound rules "black box" for use in a text-to-speech and ASR systems. The translation model is trained from phrase pairs of orthography (typical words) as the source data and phonetic symbols as the target data. In my custom tokenization script, I replace all spaces (word boundaries) with a reserved character and then inserted spaces between each character such that each character in the orthograph and phonetic spellings is one token.
I've build a working translation model using the standard train-factored-phrase-model.perl script. My bleu scores are in the .97 range but I believe this is irrelavent. In actual use, the translations are great. I'm looking to reduce the training times and speed the translations during runtime. In this data the "word" (token) order never changes between source and target. So, would it make more sense to use the recaser script or possibly even another script that eliminates both reordering and language model processing, if such a script exists? Can you point me to an appropriate training/tuning scripts and training paramaters that would be appropriate for this use? Thanks. Tom
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
