If you are really interested in Hidden Markov Models, I recommend some
of Marie Roch's courses at SDSU when she comes back from sabbatical.
She's quite good. Her work on interpreting dolphin sounds is fascinating.
Tracy R Reed wrote:
Now, we have to analyze that data. An FFT can be done in O(n log n).
So we need 10^9 * log 10^9 flops. Or, roughly 10^10 flops
continuously producing frequency bins.
I read that the latest FPU in a 3G x86 machine can do 24 GFLOPS. So
nearly half a machine would be required, right?
Somewhere in that range. Remember, we are just getting orders of
magnitude. For that purpose I would consider anything from .5 to 50
GFLOPS to be 10^9 FLOPS.
Now, we have to analyze the DFT's and convert them to something useful.
What is a DFT? Discrete fourrier transform?
Yes.
But the problem you describe may not really be the problem they need to
solve. They don't need to do general case speech recognition. If someone
buys the adword "hamburger" for my geographic area then all they have to
do is search for when I say hamburger. It seems like most big companies
I call now have basic keyword recognition systems built into their PBX's
which presumably have relatively modest hardware. So it seems like the
problem may not be nearly as large as you describe.
It depends. If all I need to discriminate between is "one", "two",
"three", ... "nine", "yes", "no", and "operator", when I am specifically
accessing a voicemail/directory then that requires quite a bit less power.
Of course, if I say "hamburger", that's pretty close to "operator" and
I'm likely to get that choice. Or, if I say "fine", I'm likely to enter
"nine" rather than "yes".
Discriminating between 20 specific words in a specific context is doable.
Discriminating between thousands of words in casual conversation is not.
-a
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list