--- Moshe Yudkowsky <[EMAIL PROTECTED]> wrote: <SNIP> > > The real trick is to get the correct posidy. Here's three sentences > with > the same words but each with different prosidy: > > "I said 'yes.' > > "I said yes?" > > "_I_ said '_yes_'"???!! > > Both formative and concatenative systems add prosidy. Adding prosidy > to > whole-word concatentative systems is difficult.
The thing is that _people_ don't do text to speech. If you were to simply read one word at a time you'd sound bad too. Try it: if, ... you. ...were, ... to, ... simply, ...read, ... You sound like a robot. No, we people know what it is we are trying to comunicate if you want a synthetic voice to sound natural you will have to tell the software the _intent_ of the words not just the words. You would need a markup language for that <emph> I </emph> said <quote><questionword> yes </quote></questionword> now the system can apply some transformations to the pitch, speed and loudness. For interactive systems markup works because the software generating the text "knows" _why_ it is generating the text Reading a book for the blind is a much harder problem. The TTS system has to do the same job as a voice actor which even includes understands the emotions of characters in a novel. Very hard to do for a computer. But interactive systems can use markup to get the "expresson right. And don't put down festival. Many (most?) of the comercial systems _are_ festival. you, ===== Chris Albertson Home: 310-376-1029 [EMAIL PROTECTED] Cell: 310-990-7550 Office: 310-336-5189 [EMAIL PROTECTED] KG6OMK __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com _______________________________________________ Asterisk-Users mailing list [EMAIL PROTECTED] http://lists.digium.com/mailman/listinfo/asterisk-users
