Re: [on-asterisk] Voicemail to text translation

Dave Donovan Sat, 20 Sep 2008 08:10:10 -0700

Lloyd,

I'm not an expert in these things, and the last time I looked at it
was over a year ago but I'll tell you what I learned and hope that it
saves you some time.

You're not going to find a program like that just does this conversion
like: wav2txt outfile.txt infile.wav.

There are packages like CMU Sphinx and it's successors.  I used the
java one and it was pretty cool what it could do but it's not a
trivial thing to understand it and tune it well for your application.
It's doable, but it's not something you're going to download and run
like zipping a file.  The reason you were point to ZOIP is that it is
a good example of using Asterisk and speech reconition.  I think it
started using Sphinx and moved to Cepstral, but I could be wrong.

It's a pretty complicated requirement you have.  The reason is that
you're talking about unrestricted speech.  That is, the speaker could
say any word, not just {yes, no} {1,2,3,4,5,6,7,8,9,0,o} 1 The larger
the set of words being used, the higher the rate of misdetection and
the higher the load on the system.  Load and can translate to delay on
a packed system but since your application isn't interactive, you
don't have to worry about that too much.

Dictionaries and Grammars are two things critical to Sphinx (and
others I guess).

A dictionary is a list of all the words that could be spoken in a
particular context and what their associated sounds (phonemes).
Tomato is one word with two phonemes, a long 'a' and a short 'a'.
(You say tomaeto, I say tomaato).  Their and There are two words with
the same phoneme.

Ideally, an IVR would ask a question with a limited domain of answers
like "Which department would you like to speak with?  You can say
things like Billing, Customer Service, Technical Support."  and then
you would have a dictionary with things like {Billing, Customer
Service, Technical Support, AP, AR, Helpdesk}.  Note that this is an
extremely small set of possibilities relative to the number of words
that could be spoken in a voicemail.  The bigger the dictionary the
slower the system.  It's like doing a SQL SELECT, if there are 10
rows, you're going to get a quick response but if you start looking
for patters of characters in millions of rows, expect to wait a second
or two unless you've got big horsepower.

You also have to consider grammars.  If I remember correctly, grammars
tell the system what words can come in what order based on words
around them.  This is so that when your users says "My IP Address is
192.168.0.1"  You don't end up with a text file that says "My eye pea
address is won nine to dot one six ate dot oh dot won."  All of those
sounds were correctly converted into text but this is not very useful
as output.  The grammar would tell the system that numbers followed by
the word address are to be recorded as digits and periods.  It can
also help the system distinguish between homonyms (words that sound
the same) like 'there' and 'their' and 'they're'.  If the sound if
followed by a noun like 'chair' then use 'their'.  If it's followed by
a verb like 'running' then use 'they're'.  If it's preceded by a verb
or preposition then use 'there'.  That's just a simple example, you
can see how complicated this could get.

Fortunately many dictionaries and grammars already exist.  Chances are
you will need to understand them though and do some work to fit them
to your application.  They usually don't contain jargon.  If your
client is a cellular company and a customer can call for support on
their 'iphone' then you're going to need to configure the system for
that.

Once you get a system up and running, a fun test phrase to use is
"recognize speech".  Depending on how you say it, this is often
detected as 'Wreck a nice beach."  It depends on your accent.  Try to
say it like someone from the southern US.

In short: googling for wav2txt.gz is not going to get it done.  You're
going to have to put in some substantial work before your application
is ready to wreck a nice beach.

Good luck,

Dave

On Sat, Sep 20, 2008 at 9:09 AM, Aloysius Thevarajah Lloyd
<[EMAIL PROTECTED]> wrote:
> Thank you Duane.
>
> I did not get enough information from the Link.
>
> I am looking for application convert a* wav -> text* ? is there any open
> source application available to do this task.
>
> THank you
> Lloyd
>
>
>
> On Fri, Sep 19, 2008 at 9:27 PM, Duane at e164 dot org <[EMAIL 
> PROTECTED]>wrote:
>
>> Aloysius Thevarajah Lloyd wrote:
>> > Hello,
>> >
>> > Is there any Open source Automate the translation or conversion of voice
>> > mail files into text ?
>>
>> Have a look at ZoIP
>>
>> http://www.uc.org/read/ZoIP
>>
>> Not what you want but lays the ground work for it.
>>
>> --
>>
>> Best regards,
>>  Duane
>>
>> http://www.freeauth.org - Enterprise Two Factor Authentication
>> http://www.nodedb.com - Think globally, network locally
>> http://www.sydneywireless.com - Telecommunications Freedom
>> http://e164.org - Global Communication for the 21st Century
>>
>> "In the long run the pessimist may be proved right,
>>    but the optimist has a better time on the trip."
>>
>>
>
>
> --
> Thanks
>
> LLoyd
>
> Tel : 416-628-6090
> Fax : 416-628-6095
> Cell : 416-500-8014
> Toll Free : 1-888-401-3735
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [on-asterisk] Voicemail to text translation

Reply via email to