RE: Part 3B of 3, Getting To Know Your Computer - Speech Synthesizer

Louis Mon, 18 Jun 2012 05:02:51 -0700

I'd be curious to know which language that document was translated from, or
which language the writer speaks as his mother tongue.

Louis Gosselin

-----Original Message-----
From: David [mailto:[email protected]] 
Sent: Sunday, June 17, 2012 2:56 PM
To: WE English mailing list
Subject: Part 3B of 3, Getting To Know Your Computer - Speech Synthesizer

(C) Copyright, David (No) - June 2012

---This is the final part of the article.---

Finalizing Your Speech Synthesizer
When you have decided on the exact technique of wpeech production, built the
whole sound library needed, created all the rules for pronounciation and
modulations, and constructed the softwae to handle all of this; you are the
happy owner of a new Synthetic voice. As you have learned, it could be fully
synthetic, or partially pre-recorded human sounds. You now can put the voice up
for sale, on the market. Well, almost.
There still remains a couple of decisions to be made. Should you let the voice
out the door, as a Single-Voice product. Or, should you bundle it up with a few
more voices?
Many manufacturers decide to bundle several voices together. Typically, in such
a bundle, you would find at least one female, and one male voice. Many times,
you might even find several versions of the two. A few manufacturers, even do
offer a child voice. And, at least one, has a voice made up of dog-barking -
should you ever want such in your projects.
How do you go about, in making more voices? This would greatly depend on the
technique you decided to go for. If your synthetic voice is a Digitized one, you
will need more people to narrate all the words in your synthetic voice's
vocabulary. You would then, have a female narrator do the whole set of words,
and a male narrator do the exact same job. Then let your software handle each of
the sound libraries, according to the request from the end-user.
Did you decide to go for the fully Electronic voice? This is the easiest to
multiply.
A good bit of tweaking - adjusting speed, duration, volume and pitch - of the
many individual tones included in your sound library, will readily make your
voice sound female, male or even childish. You can quite quickly have a deep
voice, a thin one, and a really shouty one. And, don't you forget to include
your first one; since it is all that robotic.
J
If you, on the other hand, decided to go for the Hybrid technique, there could
be a couple of ways for building new voices. Again, you could have several human
narrators read a text, and then fragmentate the recordings into the
word-fractions you need for each voice. You further could make several
adjustments to each of the fractional recordings, which quickly would make a
voice sound slightly differently. All the readers, who are old enough to have
been playing with a tape recorder with speed adjustment, willknow that it is
quite easy to make mom's voice turn really deep-sounding.
Or, you could have Daddy sound like a little boy, simply by speeding up the
playback.
Similarly, in your laboratory, you can perform a load of adjustments on the
recorded voices, and have them sound differently.
Building A Speech Synthesizer
The term "Speech Synthesizer", basically is another term for a bundle of
Synthetic Voices. You would typically, name the different voices with some human
names. Your Synthesizer, on the other hand, you would name the same as your
company or project.
For instance, Microsoft has built a Speech Synthesizer. It is called Microsoft.
It holds several voices, like Mary, Mike and Sam. Another company, AT&T, did
build a Speech Synthesizer named "AT&T Natural Voices". It holds voices like
Crystal, Mike Mel, Julia, and Ray. Nuance name their synthesizer Scansoft, and
it holds voices like Samantha, Daniel, Tom, Nora and Nanna. Another
manufacturer, NeoSpeech, has voices like Kate and Paul included in their
synthesizer. To distinguish the many voices and synthesizers, or manufacturers,
we often refer to them as "Microsoft Mike", and "AT&T Crystal", or "Eloquence
Sandy". This way, we will easily know, whether we have the Mike-voice from
Microsoft, or the Mike-voice from AT&T in question. Since any manufacturer can
name their voices what they want, anyone could have a voice named Mike. But if
you listen to the Microsoft Mike, and the AT&T Mike, you will right away hear
that they are two totally different voices.
Interfacing Your Voice
Is your product ready for shipping now? Hang on, for a moment. There is just one
small techie thing left. So far, we have made synthesizer and voices, but we
haven't yet given the enduser any way of "communicating" with the voice. We need
to provide a way for the end-user to choose which of the voices in our
synthesizer she wants listening to. Further a way to send text to the selected
voice, for narration. Also, the user should be offered the chance of at least
slow down or speed up the narration, and maybe alter the pitch. Correctly done,
our voice should signalize to the computer when it is ready for receiving text
and commands, and when not to disturb it. And, maybe quite important, it should
hold a feature for the user to stop the speech at any time.
All of this controlling, is what we name the "interface" of the speech
synthesizer.
There would be many ways for interfacing a speech synthesizer. The most commonly
used standard, is called SAPI. Letting your speech synthesizer meet the
requirements of the SAPI interface, will ensure that it can be used by numerous
software on the market.
SAPI voices come in two main flavors: SAPI 4, and SAPI 5.
Some voices come in both versions, other only in one of them. The SAPI 4
interface did offer an extensive amount of adjustments for volume, speed and
pitch. SAPI 5 voices have a somehow more limitted adjustment for each of these
features. Unfortunately, you might often find that adjusting a feature in SAPI
5, will result in either too much or too little. Say you have set your volume to
5. Adjusting it down to 4, it becomes hard to hear. Setting it to 6, your ear
drums are blown. And since you are not offered anything in between 4 and 5, or 5
and 6, you end up not using the feature of volume adjusting too often. Same goes
with the other parameters. When comes to the listening experience, it is all a
matter of personal taste and preferences. Some will claim, that the sound has
got a bit clearer on SAPI 5 voices, but that the SAPI
4 voices had better modulation. Some SAPI 5 versions are too eagerly modulating
their narrating. The modulation issue, might not really be related to the SAPI
bersion in itself. Rather, the manufacturers who upgraded their SAPI 4 voices,
might have thought they would do a bit of maintainance on their product, first
they had to meddle with it anyway. Unfortunately, such upgrading has - in some
cases resulted in a less pleasant-sounding voice
  for long-term narrating.
Yet, not all manufacturers want to 'expose' their speech synthesizer to
whichever software or user-interaction. The manufacturer could offer a
"dedicated" speech synthesizer.
This kind of synthesizers, can only be reached from the software for which it
has been dedicated. So, whilst installing a SAPI voice in general would mean
that you can reach the voice from any software that supports SAPI, a dedicated
speech synthesizer can only be reached from a given software (like a screen
reader).
As always in the computer world, there is a chance of exceptions. Even some SAPI
voices on the market, are somehow 'locked' to a given software. The voices from
Acapela, is one example of such. Acapela voices can basically only be used with
the software they were bought with, unless you buy a special license that would
open them up for usage with other software on your computer. Still, we might not
necessarily refer to this kind of SAPI as real dedicated voices.
A manufacturer might include extra controlling capabilities in a dedicated
speech synthesizer. Controlling that would need more technical handling, than
what is possible or generally accepted in the SAPI standard. Or, he might offer
a better quality, faster responding, or in any other way modified version of his
voice in the dedicated version. He might offer his voices as SAPI, as dedicated
ones, or as both.
Since a dedicated voice cannot be reached from any other than the hosting
software, there is less chance of interference from other processes on your
computer. But then again, you are locked to using the voices provided in the
synthesizer from within the hosting software. Therefore, most voices on the
market, are SAPI voices, leaving them open to receive inputs from any software
that supports this interface.
Hardware Or Software Synthesizers
So far, we have been discussing the construction and interfacing of voices and
synthesizers directly on the computer. They are stored on the hard disk, and
reached directly inside your computer.In older times (a decade or so ago), when
hard disk space was limitted and computers ran slow, the computer simply did not
have enough resources to run a speech synthesizer of fair quality. Manufacturers
therefore, did drop the whole synthesizer - including voices, pronounciation
rules, exception dictionaries and interfacing software - onto an electronic
chipset. This chipset in turn, was enclosed in a small unit, that the user could
connect to his computer. Since a good amount of hardware was included in this
kind of units, we call them "hardware synthesizers".
They still can be had, but are rather rarely seen.
Modern computers have enough processor, memory (RAM) and hard disk space, to run
a fairly and even good sounding speech synthesizer. Furthermore, today's
computers have built-in sound cards, that are capable of handling speech and
music from multiple sources, and with high quality and precission. As such, the
hardware synthesizers are no longer needed. And excluding them, will make the
portability of your computer system far better, when comes to a laptop. Since
most of the speech synthesizer now is handled inside your computer, and little
hardware is included in the process, we call the modern speech synthesizers
"software synthesizers". One big benefit of a software synthesizer is, that you
can add on as many voices as you want. And you can have a collection of voices
from several manufacturers, meaning that you can have different voices for
different tasks on your computer. They also might be more responsive in many
cases, due to things happening far more quickly inside your computer, than most
external connections could ever perform.
Speech Synthesizers In Combination With A Screen Reader Whether you now have
manufactured your own synthetic voice, or you made the shortcut and bought one
of the many available on the market, you are only half way in making real use of
it. It is like buying a flute, and then having noone to play it.
Several processes on your computer could make use of the new voice. We do
presume, that the processes are handled by software that supports, or
communicats, with the interface of your Speech Synthesizer. One software might
be acting like a calculator.
It might send any of the numbers you enter, as well as the results, to your
synthesizer; making the whole thing into a talking calculator. Another software
is a game of some kind. Whenever certain things happen in the game, a phrase or
two are send to your synthesizer, for voicing. Or, maybe you have got hold of
one of the software that retrieves the current time at given intervals, then
sends the retreived numbers to the synthesizer, and you will hear the computer
telling you the time of the day, every single hour. Still other software
packages are created so as to retrieve weather reports and forecasts from the
internet, then to send the results to the synthesizer, and you will have it read
out to you, at given intervals. All of these examples, are what we call
Self-Voicing Software. The establish direct contact between themselves and the
speech synthesizer.
Sometimes, we want a bit more use of our synthesizer. Maybe you are a student,
or simply just love to read a good novel. Or, do you have huge amounts of
documents you need to read in your job? In such cases a TTS (Test-To-Speech)
software, could be in place for you. The job for a TTs, is to send a bigger
amount of text to your synthesizer. There is really a good amount of processing
included in this. For one thing, the software should look out for when your
synthesizer is 'ready to receive text', and then send a chunk over. Then, it
would patiently wait for the synthesizer to narrate the piece of text, before a
new block of your document is being send to the synthesizer. TTS software might
also hold features for setting up correct pausing at the beginning of each
paragraph and chapter in the document. It might further offer you extended
control when comes to pronounciation of characters, words and phrases. Many TTS
software even offer you the chance of turning the narrated document into sound
files, like MP3, that you can play back on your portable player. A cheap and
very good TTS, is TextAloud manufactured by Nextup.com.
But your possibilities don't end here. Whether you are a dysletic, or maybe you
don't have enough sight to read the computer screen, your new speech synthesizer
will be a 'helping hand' hereafter. Well, if you get a Screen Reader software.
As the term indicates, this kind of software will act as a reader, or a pair of
eyes, on the computer screen. It will keep track of any changes that take effect
on the screen, and whenever a change is distinguished, the information hereof is
send to the speech synthesizer, which will read it out. Thereby, you can hear
what is going on on your screen at any time. A screen reader offers you a long
line of controlling features.
You can decide which part of the screen should be watched and narrated, as well
as when to have anything send to the synthesizer. Further the more sofisticated
screen readers, offer the user full control as to how he wants to be informed
about the things happening on the computer. He can, for instance, decide whether
he wants a word read as a word, or being spelled out letter by letter. Or, he
can decide if he wants to hear every tiny change that is being made in a window
on the screen, or only when given things take place.
A very basic screen reader is included with Windows itself. Any Windows user can
go to the Run menu, and type Narratorfollowed by a press on the Enter key.
Microsoft has even included a SAPI voice, which will immediately jump into
action, and let you hear some of the things taking place on the computer screen.
As mentioned, this is a very basic screen reader, enough to give you a touch of
what it is all about. Should you feel for more control and feedback, the free
NVDA screen reader could be a choice. Or you could buy a fully functional screen
reader like Window-Eyes, Jaws, Blindows or SuperNova.
This article intended to let you have a peek behind the scene, when comes to
some of the technology that makes many an equipment in our modern living speak.
I do hope that you have got some answers to questions you might have had in this
regard. Also, I do hope you have got a better grasp of some of the terms, that
you might be coming across when looking for a new voice on your computer. If you
have feedback on the material here provided, you are welcome to contact me at:
[email protected] <mailto:[email protected]> Any manufacturer or
product name mentioned in this article, is the property of the respective
owners. None of them have been mentioned for advertising reasons, merely to
inform the reader of some of the available products on the market.
June 07, 2012
C Copyright, David (Norway) - All rights reserved ---End Of Article---

If you reply to this message it will be delivered to the original sender only. 
If your reply would benefit others on the list and your message is related to 
GW Micro, then please consider sending your message to [email protected] so 
the entire list will receive it.

GW-Info messages are archived at http://www.gwmicro.com/gwinfo. You can manage 
your list subscription at http://www.gwmicro.com/listserv.

RE: Part 3B of 3, Getting To Know Your Computer - Speech Synthesizer

Reply via email to