This sounds like what I'm trying to do, Thanks - juju

On Thu, Aug 27, 2009 at 2:22 PM, Taka Kojima <t...@gigafied.com> wrote:

> You wouldn't actually do the speech recognition in Flash, but rather have
> Flash record to a file, have the server process it (using an already
> developed program of course, as as Steven explains speech to text is not
> just a simple matter).
>
> So essentially the flow would be:
>
> Flash Player would record the sound, tell server to save file to x
> location,
> open a socket connection to server, tell the server to process the sound
> file. Then, through the socket connection you would send data back to Flash
> as you process the file.
>
> This is a real simple overview, but essentially if you in fact can record
> sound through the microphone with Flash Player and you do have an already
> built speech to text program installed and working (with an API that you
> could plug into), then it is feasible and it's not as complicated as
> writing
> your own speech to text recognition software in ActionScript.
>
> I have implemented a similar solution for creating video files with Flash
> (i.e. you drag 5-10 second clips onto a timeline, send the data back to the
> server, the server reads the sequence, puts the clips together into a file
> and exports as a new video file), so this is a workable and good solution,
> as well as your best bet.
>
> - Taka
>
> On Wed, Aug 26, 2009 at 10:41 PM, Steven Sacks <flash...@stevensacks.net
> >wrote:
>
> > This is how you record sound:
> > http://www.getmicrophone.com/?p=69
> >
> > If you're asking how to convert sound waves into speech, dude, what?  Do
> > you realize how challenging speech recognition is?  Wait, why am I asking
> > you this?  If you did, you wouldn't be asking people on a Flash list how
> to
> > do it, as if it's some piece of code somebody can copy and paste or a few
> > links that will tell you the secret formula.
> >
> > Most speech to text programs are based on the Hidden Markov models. In
> > speech recognition, the hidden Markov model would output a sequence of
> > n-dimensional real-valued vectors (with n being a small integer, such as
> > 10), outputting one of these every 10 milliseconds. The vectors would
> > consist of cepstral coefficients, which are obtained by taking a Fourier
> > transform of a short time window of speech and decorrelating the spectrum
> > using a cosine transform, then taking the first (most significant)
> > coefficients. The hidden Markov model will tend to have in each state a
> > statistical distribution that is a mixture of diagonal covariance
> Gaussians
> > which will give a likelihood for each observed vector. Each word, or (for
> > more general speech recognition systems), each phoneme, will have a
> > different output distribution; a hidden Markov model for a sequence of
> words
> > or phonemes is made by concatenating the individual trained hidden Markov
> > models for the separate words and phonemes.
> >
> > There you have it. That's a high level overview of speech to text. Do you
> > understand anything in that paragraph?  Probably not.
> >
> > Unless you're willing to study and put in the time to figure out how to
> do
> > this, you're not going to figure it out.  Nobody is going to point you in
> > the right direction because this is a very niche knowledge area and none
> of
> > these people are on Flashcoders.  They're at universities working on
> their
> > doctorates or working for the military or government, or some private
> > company and they're not sharing this information.  This is the stuff
> patents
> > are made of.
> >
> > So either give up now (because what you want is some easy solution and
> > there isn't one) or start doing real research, learn some serious
> Calculus,
> > become an expert on on sound, speech, waveforms, and then figure out how
> to
> > port all of this into Flash, which, in all likelihood, lacks the
> performance
> > to actually achieve this.
> >
> > You'll probably have to do it on the server, passing the sound to the
> > server as an mp3 file, and then pass the text back. That's the only thing
> I
> > can think of that would possibly be able to do this.
> >
> > Prove me wrong.  If you pull this off, you could probably build an entire
> > company around your technology.
> >
> > _______________________________________________
> > Flashcoders mailing list
> > Flashcoders@chattyfig.figleaf.com
> > http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
> >
> _______________________________________________
> Flashcoders mailing list
> Flashcoders@chattyfig.figleaf.com
> http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
>
_______________________________________________
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders

Reply via email to