(C) Copyright, David (No) - June 2012

---This is the final part of the article.---


Finalizing Your Speech Synthesizer
When you have decided on the exact technique of wpeech production, built the 
whole
sound library needed, created all the rules for pronounciation and modulations, 
and
constructed the softwae to handle all of this; you are the happy owner of a new 
Synthetic
voice. As you have learned, it could be fully synthetic, or partially 
pre-recorded
human sounds. You now can put the voice up for sale, on the market. Well, 
almost.
There still remains a couple of decisions to be made. Should you let the voice 
out
the door, as a Single-Voice product. Or, should you bundle it up with a few more
voices?
Many manufacturers decide to bundle several voices together. Typically, in such 
a
bundle, you would find at least one female, and one male voice. Many times, you 
might
even find several versions of the two. A few manufacturers, even do offer a 
child
voice. And, at least one, has a voice made up of dog-barking - should you ever 
want
such in your projects.
How do you go about, in making more voices? This would greatly depend on the 
technique
you decided to go for. If your synthetic voice is a Digitized one, you will need
more people to narrate all the words in your synthetic voice's vocabulary. You 
would
then, have a female narrator do the whole set of words, and a male narrator do 
the
exact same job. Then let your software handle each of the sound libraries, 
according
to the request from the end-user.
Did you decide to go for the fully Electronic voice? This is the easiest to 
multiply.
A good bit of tweaking - adjusting speed, duration, volume and pitch - of the 
many
individual tones included in your sound library, will readily make your voice 
sound
female, male or even childish. You can quite quickly have a deep voice, a thin 
one,
and a really shouty one. And, don't you forget to include your first one; since 
it
is all that robotic.
J
If you, on the other hand, decided to go for the Hybrid technique, there could 
be
a couple of ways for building new voices. Again, you could have several human 
narrators
read a text, and then fragmentate the recordings into the word-fractions you 
need
for each voice. You further could make several adjustments to each of the 
fractional
recordings, which quickly would make a voice sound slightly differently. All the
readers, who are old enough to have been playing with a tape recorder with speed
adjustment, willknow that it is quite easy to make mom's voice turn really 
deep-sounding.
Or, you could have Daddy sound like a little boy, simply by speeding up the 
playback.
Similarly, in your laboratory, you can perform a load of adjustments on the 
recorded
voices, and have them sound differently.
Building A Speech Synthesizer
The term "Speech Synthesizer", basically is another term for a bundle of 
Synthetic
Voices. You would typically, name the different voices with some human names. 
Your
Synthesizer, on the other hand, you would name the same as your company or 
project.
For instance, Microsoft has built a Speech Synthesizer. It is called Microsoft. 
It
holds several voices, like Mary, Mike and Sam. Another company, AT&T, did build 
a
Speech Synthesizer named "AT&T Natural Voices". It holds voices like Crystal, 
Mike
Mel, Julia, and Ray. Nuance name their synthesizer Scansoft, and it holds voices
like Samantha, Daniel, Tom, Nora and Nanna. Another manufacturer, NeoSpeech, has
voices like Kate and Paul included in their synthesizer. To distinguish the many
voices and synthesizers, or manufacturers, we often refer to them as "Microsoft 
Mike",
and "AT&T Crystal", or "Eloquence Sandy". This way, we will easily know, whether
we have the Mike-voice from Microsoft, or the Mike-voice from AT&T in question. 
Since
any manufacturer can name their voices what they want, anyone could have a voice
named Mike. But if you listen to the Microsoft Mike, and the AT&T Mike, you will
right away hear that they are two totally different voices.
Interfacing Your Voice
Is your product ready for shipping now? Hang on, for a moment. There is just one
small techie thing left. So far, we have made synthesizer and voices, but we 
haven't
yet given the enduser any way of "communicating" with the voice. We need to 
provide
a way for the end-user to choose which of the voices in our synthesizer she 
wants
listening to. Further a way to send text to the selected voice, for narration. 
Also,
the user should be offered the chance of at least slow down or speed up the 
narration,
and maybe alter the pitch. Correctly done, our voice should signalize to the 
computer
when it is ready for receiving text and commands, and when not to disturb it. 
And,
maybe quite important, it should hold a feature for the user to stop the speech 
at
any time.
All of this controlling, is what we name the "interface" of the speech 
synthesizer.
There would be many ways for interfacing a speech synthesizer. The most commonly
used standard, is called SAPI. Letting your speech synthesizer meet the 
requirements
of the SAPI interface, will ensure that it can be used by numerous software on 
the
market.
SAPI voices come in two main flavors: SAPI 4, and SAPI 5.
Some voices come in both versions, other only in one of them. The SAPI 4 
interface
did offer an extensive amount of adjustments for volume, speed and pitch. SAPI 5
voices have a somehow more limitted adjustment for each of these features. 
Unfortunately,
you might often find that adjusting a feature in SAPI 5, will result in either 
too
much or too little. Say you have set your volume to 5. Adjusting it down to 4, 
it
becomes hard to hear. Setting it to 6, your ear drums are blown. And since you 
are
not offered anything in between 4 and 5, or 5 and 6, you end up not using the 
feature
of volume adjusting too often. Same goes with the other parameters. When comes 
to
the listening experience, it is all a matter of personal taste and preferences. 
Some
will claim, that the sound has got a bit clearer on SAPI 5 voices, but that the 
SAPI
4 voices had better modulation. Some SAPI 5 versions are too eagerly modulating 
their
narrating. The modulation issue, might not really be related to the SAPI bersion
in itself. Rather, the manufacturers who upgraded their SAPI 4 voices, might 
have
thought they would do a bit of maintainance on their product, first they had to 
meddle
with it anyway. Unfortunately, such upgrading has - in some cases
resulted in a less pleasant-sounding voice
  for long-term narrating.
Yet, not all manufacturers want to 'expose' their speech synthesizer to 
whichever
software or user-interaction. The manufacturer could offer a "dedicated" speech 
synthesizer.
This kind of synthesizers, can only be reached from the software for which it 
has
been dedicated. So, whilst installing a SAPI voice in general would mean that 
you
can reach the voice from any software that supports SAPI, a dedicated speech 
synthesizer
can only be reached from a given software (like a screen reader).
As always in the computer world, there is a chance of exceptions. Even some SAPI
voices on the market, are somehow 'locked' to a given software. The voices from 
Acapela,
is one example of such. Acapela voices can basically only be used with the 
software
they were bought with, unless you buy a special license that would open them up 
for
usage with other software on your computer. Still, we might not necessarily 
refer
to this kind of SAPI as real dedicated voices.
A manufacturer might include extra controlling capabilities in a dedicated 
speech
synthesizer. Controlling that would need more technical handling, than what is 
possible
or generally accepted in the SAPI standard. Or, he might offer a better quality,
faster responding, or in any other way modified version of his voice in the 
dedicated
version. He might offer his voices as SAPI, as dedicated ones, or as both.
Since a dedicated voice cannot be reached from any other than the hosting 
software,
there is less chance of interference from other processes on your computer. But 
then
again, you are locked to using the voices provided in the synthesizer from 
within
the hosting software. Therefore, most voices on the market, are SAPI voices, 
leaving
them open to receive inputs from any software that supports this interface.
Hardware Or Software Synthesizers
So far, we have been discussing the construction and interfacing of voices and 
synthesizers
directly on the computer. They are stored on the hard disk, and reached directly
inside your computer.In older times (a decade or so ago), when hard disk space 
was
limitted and computers ran slow, the computer simply did not have enough 
resources
to run a speech synthesizer of fair quality. Manufacturers therefore, did drop 
the
whole synthesizer - including voices, pronounciation rules, exception 
dictionaries
and interfacing software - onto an electronic chipset. This chipset in turn, was
enclosed in a small unit, that the user could connect to his computer. Since a 
good
amount of hardware was included in this kind of units, we call them "hardware 
synthesizers".
They still can be had, but are rather rarely seen.
Modern computers have enough processor, memory (RAM) and hard disk space, to run
a fairly and even good sounding speech synthesizer. Furthermore, today's 
computers
have built-in sound cards, that are capable of handling speech and music from 
multiple
sources, and with high quality and precission. As such, the hardware 
synthesizers
are no longer needed. And excluding them, will make the portability of your 
computer
system far better, when comes to a laptop. Since most of the speech synthesizer 
now
is handled inside your computer, and little hardware is included in the process,
we call the modern speech synthesizers "software synthesizers". One big benefit 
of
a software synthesizer is, that you can add on as many voices as you want. And 
you
can have a collection of voices from several manufacturers, meaning that you can
have different voices for different tasks on your computer. They also might be 
more
responsive in many cases, due to things happening far more quickly inside your 
computer,
than most external connections could ever perform.
Speech Synthesizers In Combination With A Screen Reader
Whether you now have manufactured your own synthetic voice, or you made the 
shortcut
and bought one of the many available on the market, you are only half way in 
making
real use of it. It is like buying a flute, and then having noone to play it.
Several processes on your computer could make use of the new voice. We do 
presume,
that the processes are handled by software that supports, or communicats, with 
the
interface of your Speech Synthesizer. One software might be acting like a 
calculator.
It might send any of the numbers you enter, as well as the results, to your 
synthesizer;
making the whole thing into a talking calculator. Another software is a game of 
some
kind. Whenever certain things happen in the game, a phrase or two are send to 
your
synthesizer, for voicing. Or, maybe you have got hold of one of the software 
that
retrieves the current time at given intervals, then sends the retreived numbers 
to
the synthesizer, and you will hear the computer telling you the time of the day,
every single hour. Still other software packages are created so as to retrieve 
weather
reports and forecasts from the internet, then to send the results to the 
synthesizer,
and you will have it read out to you, at given intervals. All of these examples,
are what we call Self-Voicing Software. The establish direct contact between 
themselves
and the speech synthesizer.
Sometimes, we want a bit more use of our synthesizer. Maybe you are a student, 
or
simply just love to read a good novel. Or, do you have huge amounts of documents
you need to read in your job? In such cases a TTS (Test-To-Speech) software, 
could
be in place for you. The job for a TTs, is to send a bigger amount of text to 
your
synthesizer. There is really a good amount of processing included in this. For 
one
thing, the software should look out for when your synthesizer is 'ready to 
receive
text', and then send a chunk over. Then, it would patiently wait for the 
synthesizer
to narrate the piece of text, before a new block of your document is being send 
to
the synthesizer. TTS software might also hold features for setting up correct 
pausing
at the beginning of each paragraph and chapter in the document. It might further
offer you extended control when comes to pronounciation of characters, words and
phrases. Many TTS software even offer you the chance of turning the narrated 
document
into sound files, like MP3, that you can play back on your portable player. A 
cheap
and very good TTS, is TextAloud manufactured by Nextup.com.
But your possibilities don't end here. Whether you are a dysletic, or maybe you 
don't
have enough sight to read the computer screen, your new speech synthesizer will 
be
a 'helping hand' hereafter. Well, if you get a Screen Reader software. As the 
term
indicates, this kind of software will act as a reader, or a pair of eyes, on the
computer screen. It will keep track of any changes that take effect on the 
screen,
and whenever a change is distinguished, the information hereof is send to the 
speech
synthesizer, which will read it out. Thereby, you can hear what is going on on 
your
screen at any time. A screen reader offers you a long line of controlling 
features.
You can decide which part of the screen should be watched and narrated, as well 
as
when to have anything send to the synthesizer. Further the more sofisticated 
screen
readers, offer the user full control as to how he wants to be informed about the
things happening on the computer. He can, for instance, decide whether he wants 
a
word read as a word, or being spelled out letter by letter. Or, he can decide if
he wants to hear every tiny change that is being made in a window on the screen,
or only when given things take place.
A very basic screen reader is included with Windows itself. Any Windows user can
go to the Run menu, and type
Narratorfollowed by a press on the Enter key. Microsoft has even included a SAPI
voice, which will immediately jump into action, and let you hear some of the 
things
taking place on the computer screen. As mentioned, this is a very basic screen 
reader,
enough to give you a touch of what it is all about. Should you feel for more 
control
and feedback, the free NVDA screen reader could be a choice. Or you could buy a 
fully
functional screen reader like Window-Eyes, Jaws, Blindows or SuperNova.
This article intended to let you have a peek behind the scene, when comes to 
some
of the technology that makes many an equipment in our modern living speak. I do 
hope
that you have got some answers to questions you might have had in this regard. 
Also,
I do hope you have got a better grasp of some of the terms, that you might be 
coming
across when looking for a new voice on your computer. If you have feedback on 
the
material here provided, you are welcome to contact me at:
[email protected]
Any manufacturer or product name mentioned in this article, is the property of 
the
respective owners. None of them have been mentioned for advertising reasons, 
merely
to inform the reader of some of the available products on the market.
June 07, 2012
© Copyright, David (Norway) - All rights reserved
---End Of Article---

If you reply to this message it will be delivered to the original sender only. 
If your reply would benefit others on the list and your message is related to 
GW Micro, then please consider sending your message to [email protected] so 
the entire list will receive it.

GW-Info messages are archived at http://www.gwmicro.com/gwinfo. You can manage 
your list subscription at http://www.gwmicro.com/listserv.

Reply via email to