Re: How do screen readers work when there's a console and a normal window?

AudioGames . net Forum — Developers room : camlorn via Audiogames-reflector Fri, 14 Nov 2014 16:04:09 -0800

I'd not worry about setting anything from your app. At all. The user should and will probably have done this right for their setup. Keeping the console output might be interesting and worthwhile unless it contributes to making the program more complicated--I don't personally see a use for it, but someone else might.
As for naturalness, our definitions are, believe it or not, the same. We just don't care about it; many of us are more interested in clarity. This is a bit long and arguably off topic, but may interest you:
there are two main models of speech synthesis. The first of these is what you hear as natural and is called concatenative synthesis. You basically take a bunch of wave files representing phonemes, made using a recording studio and a bunch of manual editing, and stick them together. On top of this, you throw a bunch of math to make it sound natural, and then you apply a variety of algorith ms that can speed up recordings of speech for the rate adjustment. I'm simplifying, mostly because doing this properly is a black art and trade secret and all of that, and I don't have the background to build one yet (give me a couple more years). This is the realm of people like nuance, with their ability to basically make hugely big databases. Any synth you're aware of that you think of natural is most likely either using this technique or a technique that's based on it. Ironically, making a very low quality one is something that any programmer who can concatenate lists can work out, though a database or other rule set for transforming text to phonemes is still needed. At normal talking speeds, these sound really great. But you can't crank them up without quality loss no matter how much you wish you could.
The second method, the one most blind people prefer, is definitively not natural. There are a variety of mathematical models. The most common of these are based on formant synthesis, which relies on (oversimplifying, again lack of background but give me a couple years) adding sine waves. You basically pull out the fundamental frequencies of the phonemes and play them back. The trick, of course, is making it not sound like a flute and getting the transitions right--something that many scientists have spent a long time on. Espeak uses this as does Eloquence, though the algorithms behind Eloquence are more widely published and known (look up klatt synthesis, if you're interested). there are two advantages to this model. The first, not so much of interest here, is that it requires much less system resources and can be anywhere from 100x to 10000x smaller (depends on what you need to synthesize, I've seen an impressively low quality one in 1k of _javascript_ once). The second is that you don't crank up the speed at the end. Instea d, you adjust the words per minute setting, and literally all stages of the pipeline reconfigure themselves--it's an actual mathematical model of the human vocal tract, not just a big data problem. Consequently you can literally run the whole thing faster without cheating at the end. Synths like this don't typically require tricks until you want to go post-500 words a minute, and many don't require it even then. As you turn them up, the loss of quality is very, very minimal. The cost is naturalness. They're quite, quite clear, just you'd never, ever, ever mistake them for humans and there's very definitely an adjustment period. Unfortunately, now that you can do stuff like waste a few hundred megabytes of ram, and given that 90% of the population is more interested in it being easily understood by anyone with no prior exposure, these are on the decline. This is unfortunate for us, though it should be said that E speak and the work NVDA has done on Speechplayer should be forward compatible for at least 20 years.
To draw an analogy, the former is like scaling an image to 200 times its original size. The latter is like having a function that knows an algorithm to draw the same image at any size, without containing the image itself. The second might be a bit blurry, but it's going to be pretty much the same amount of blurry for any size no matter how big you make it.

_______________________________________________
Audiogames-reflector mailing list
Audiogames-reflector@sabahattin-gucukoglu.com
https://sabahattin-gucukoglu.com/cgi-bin/mailman/listinfo/audiogames-reflector

Re: How do screen readers work when there's a console and a normal window?

Re: How do screen readers work when there's a console and a normal window?

Reply via email to