can I make this work… (Foundation for accessibility project)

Eric S. Johansson Mon, 17 Nov 2014 23:53:52 -0800

this is a rather different use case than what you've been thinking offor KVM. It could mean significant improvement of the quality of life ofdisabled programs like myself. It's difficult to convey what it's liketo try to use computers with speech recognition for something other thanwriting so, bear with me when I say something is real but don't quiteprove it yet. also, please take it as read that the only really usablespeech recognition environment out there is NaturallySpeaking withGoogle close behind in terms of accuracy but not even in the same planetfor ability to extend for speech enabled applications.

I'm trying to figure out ways of making it possible to drive Linux fromWindows speech recognition (NaturallySpeaking). The goal is a systemwhere Windows runs in a virtual machine (Linux host), audio is passedthrough from a USB headset to the Windows environment. And the output ofthe recognition engine is piped through some magic back to the Linux host.

the hardest part of all of this without question is getting cleanuninterrupted audio from the USB device all the way through to theWindows virtual machine. virtual box, VMware fail mostly in deliveringreliable audio to the virtual machine.

I expect KVM to not work right with regards to getting cleanaudio/real-time USB but I'm asking in case I'm wrong. if it doesn't workor can't work yet, what would it take to make it possible for cleanaudio to be passed through to a guest?

--- Why this is important, approaches that failed, why think this willwork. Boring accessibility info ---

The history of trying to make Windows or DOS based speech recognitiondrive Linux has a long and tortured history. almost all of them involvesome form of an open loop system that ignores system context and countson the grammar to specify the context and the subsequent keystrokesinjected into the target system.

This model fails because it effectively speaking keyboard functionswhich wastes the majority of the power of a good grammar in a speechrecognition environment.

Most common configuration for speech recognition in a virtualizedenvironment today is that Windows is the host with speech recognitionand Linux is the guest. It's just a reimplementation of the open-loopsystem described above where your dictation results are keystrokesinjected into the virtual machine console window. Sometimes works,sometimes drops characters.

One big failing of the Windows host/Linux guest environments is inaddition to dropping characters,it seems to drop segments of the audiostream on the Windows side. It's common but not frequent for this tohappen anyway when running Windows with any sort of CPU utilization butit's almost guaranteed as soon as a virtual machine starts up.

Another failing is that the context the recognition application is awareof is the window of the console. It knows nothing about the internalcontext of the virtual machine (what application has focus). Andunfortunately it can't know anything more because of the way thatNaturallySpeaking uses the local Windows context.

Inverting the relationship between guest and host where Linux is thehost and Windows is the guest solves at least the focus problem. In thevirtual machine, you have a portal application the canal control theperception of context and tunnels the character stream from therecognition engine into the host OS to drive it open loop. The portalapplication[1] can also communicate which grammar sequence has beenparsed and what action should be taken on the host site. At this point,we now have the capabilities of a closed-loop speech recognitionenvironment where a grammar can read context to generate a new grammarto fit the applications state. This means smaller utterances which canbe disambiguated versus the more traditional large utterancedisambiguation technique.

A couple other advantages of Windows as a guest is that it only runspeech recognition in the portal. There's no browsers, no flash,JavaScript, viruses and other "stuff" taking up resources anddistracting from speech recognition working as well as possible. Thedownside is that the host running the virtual machine needs to make theVM very high almost real-time priority[2] so that it doesn't stall andspeech recognition works as quickly and as accurately as possible.

Hope I didn't bore you too badly. Thank you for reading and I hope wecan make this work.

--- eric



[1] should I call it cake?
[2]  I'm looking at you Firefox, sucking down 30% of the CPU doing nothing
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

can I make this work… (Foundation for accessibility project)

Reply via email to