At 15:41 2003-07-15 -1000, Matthew John Darnell wrote:
Why hasn't someone found 50 people who sound alike, put them in sound
studios and record the 10,000 most commonly used words.  You would all
differnent forms of the 1,000 most words, i.e. leading, trailing, question
etc.

You can synthesize the other 0.05% when you run into them.  With hard drives
so big, processors so fast and EXT3 that can handle 30,000+ files in a
single directory that seems like the way to do it.

You could sell it for BIG bucks.

Text-to-Speech (TTS) is usually either "formative," created by synthesis of sounds; or concatenative, created by concatenating sounds of actual speech samples.


However, concatenative TTS usually works by using small fragments of speech, not entire words. The storage requirements are much smaller, and it gives the system an opportunity to pick units of speech that match the units of speech that precede and follow them.

The real trick is to get the correct posidy. Here's three sentences with the same words but each with different prosidy:

"I said 'yes.'

"I said yes?"

"_I_ said '_yes_'"???!!

Both formative and concatenative systems add prosidy. Adding prosidy to whole-word concatentative systems is difficult.

If you're in a buying mood, there are some excellent TTS systems available. For example, Rhetorical (http://www.rhetorical.com) has some excellent voices. And they have the funniest TTS current available is the "Southern California female" voice; I use it for non-serious demos ("That's so totally awesome.")

Commercial TTS is actually very intelligble and perfectly adequate for many tasks.




-- Moshe Yudkowsky Disaggregate 2952 W Fargo Chicago, IL 60645 USA

 www.Disaggregate.com
 [EMAIL PROTECTED]
 +1 773 764 8727

_______________________________________________
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users

Reply via email to