Re: [on-asterisk] Voicemail to text translation

Aloysius Thevarajah Lloyd Wed, 24 Sep 2008 20:09:09 -0700

 Dave Donovan,

Thank you for the very helpful explanation.


Thank you again.

Lloyd


On Sat, Sep 20, 2008 at 11:10 AM, Dave Donovan <[EMAIL PROTECTED]>wrote:

> Lloyd,
>
> I'm not an expert in these things, and the last time I looked at it
> was over a year ago but I'll tell you what I learned and hope that it
> saves you some time.
>
> You're not going to find a program like that just does this conversion
> like: wav2txt outfile.txt infile.wav.
>
> There are packages like CMU Sphinx and it's successors.  I used the
> java one and it was pretty cool what it could do but it's not a
> trivial thing to understand it and tune it well for your application.
> It's doable, but it's not something you're going to download and run
> like zipping a file.  The reason you were point to ZOIP is that it is
> a good example of using Asterisk and speech reconition.  I think it
> started using Sphinx and moved to Cepstral, but I could be wrong.
>
> It's a pretty complicated requirement you have.  The reason is that
> you're talking about unrestricted speech.  That is, the speaker could
> say any word, not just {yes, no} {1,2,3,4,5,6,7,8,9,0,o} 1 The larger
> the set of words being used, the higher the rate of misdetection and
> the higher the load on the system.  Load and can translate to delay on
> a packed system but since your application isn't interactive, you
> don't have to worry about that too much.
>
> Dictionaries and Grammars are two things critical to Sphinx (and
> others I guess).
>
> A dictionary is a list of all the words that could be spoken in a
> particular context and what their associated sounds (phonemes).
> Tomato is one word with two phonemes, a long 'a' and a short 'a'.
> (You say tomaeto, I say tomaato).  Their and There are two words with
> the same phoneme.
>
> Ideally, an IVR would ask a question with a limited domain of answers
> like "Which department would you like to speak with?  You can say
> things like Billing, Customer Service, Technical Support."  and then
> you would have a dictionary with things like {Billing, Customer
> Service, Technical Support, AP, AR, Helpdesk}.  Note that this is an
> extremely small set of possibilities relative to the number of words
> that could be spoken in a voicemail.  The bigger the dictionary the
> slower the system.  It's like doing a SQL SELECT, if there are 10
> rows, you're going to get a quick response but if you start looking
> for patters of characters in millions of rows, expect to wait a second
> or two unless you've got big horsepower.
>
> You also have to consider grammars.  If I remember correctly, grammars
> tell the system what words can come in what order based on words
> around them.  This is so that when your users says "My IP Address is
> 192.168.0.1"  You don't end up with a text file that says "My eye pea
> address is won nine to dot one six ate dot oh dot won."  All of those
> sounds were correctly converted into text but this is not very useful
> as output.  The grammar would tell the system that numbers followed by
> the word address are to be recorded as digits and periods.  It can
> also help the system distinguish between homonyms (words that sound
> the same) like 'there' and 'their' and 'they're'.  If the sound if
> followed by a noun like 'chair' then use 'their'.  If it's followed by
> a verb like 'running' then use 'they're'.  If it's preceded by a verb
> or preposition then use 'there'.  That's just a simple example, you
> can see how complicated this could get.
>
> Fortunately many dictionaries and grammars already exist.  Chances are
> you will need to understand them though and do some work to fit them
> to your application.  They usually don't contain jargon.  If your
> client is a cellular company and a customer can call for support on
> their 'iphone' then you're going to need to configure the system for
> that.
>
> Once you get a system up and running, a fun test phrase to use is
> "recognize speech".  Depending on how you say it, this is often
> detected as 'Wreck a nice beach."  It depends on your accent.  Try to
> say it like someone from the southern US.
>
> In short: googling for wav2txt.gz is not going to get it done.  You're
> going to have to put in some substantial work before your application
> is ready to wreck a nice beach.
>
> Good luck,
>
> Dave
>
>
> On Sat, Sep 20, 2008 at 9:09 AM, Aloysius Thevarajah Lloyd
> <[EMAIL PROTECTED]> wrote:
> > Thank you Duane.
> >
> > I did not get enough information from the Link.
> >
> > I am looking for application convert a* wav -> text* ? is there any open
> > source application available to do this task.
> >
> > THank you
> > Lloyd
> >
> >
> >
> > On Fri, Sep 19, 2008 at 9:27 PM, Duane at e164 dot org <[EMAIL PROTECTED]
> >wrote:
> >
> >> Aloysius Thevarajah Lloyd wrote:
> >> > Hello,
> >> >
> >> > Is there any Open source Automate the translation or conversion of
> voice
> >> > mail files into text ?
> >>
> >> Have a look at ZoIP
> >>
> >> http://www.uc.org/read/ZoIP
> >>
> >> Not what you want but lays the ground work for it.
> >>
> >> --
> >>
> >> Best regards,
> >>  Duane
> >>
> >> http://www.freeauth.org - Enterprise Two Factor Authentication
> >> http://www.nodedb.com - Think globally, network locally
> >> http://www.sydneywireless.com - Telecommunications Freedom
> >> http://e164.org - Global Communication for the 21st Century
> >>
> >> "In the long run the pessimist may be proved right,
> >>    but the optimist has a better time on the trip."
> >>
> >>
> >
> >
> > --
> > Thanks
> >
> > LLoyd
> >
> > Tel : 416-628-6090
> > Fax : 416-628-6095
> > Cell : 416-500-8014
> > Toll Free : 1-888-401-3735
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
Thanks

LLoyd

Tel : 416-500-8014

Re: [on-asterisk] Voicemail to text translation

Reply via email to