this is a rather different use case than what you've been thinking of for KVM. It could mean significant improvement of the quality of life of disabled programs like myself. It's difficult to convey what it's like to try to use computers with speech recognition for something other than writing so, bear with me when I say something is real but don't quite prove it yet. also, please take it as read that the only really usable speech recognition environment out there is NaturallySpeaking with Google close behind in terms of accuracy but not even in the same planet for ability to extend for speech enabled applications.

I'm trying to figure out ways of making it possible to drive Linux from Windows speech recognition (NaturallySpeaking). The goal is a system where Windows runs in a virtual machine (Linux host), audio is passed through from a USB headset to the Windows environment. And the output of the recognition engine is piped through some magic back to the Linux host.

the hardest part of all of this without question is getting clean uninterrupted audio from the USB device all the way through to the Windows virtual machine. virtual box, VMware fail mostly in delivering reliable audio to the virtual machine.

I expect KVM to not work right with regards to getting clean audio/real-time USB but I'm asking in case I'm wrong. if it doesn't work or can't work yet, what would it take to make it possible for clean audio to be passed through to a guest?

--- Why this is important, approaches that failed, why think this will work. Boring accessibility info ---

The history of trying to make Windows or DOS based speech recognition drive Linux has a long and tortured history. almost all of them involve some form of an open loop system that ignores system context and counts on the grammar to specify the context and the subsequent keystrokes injected into the target system.

This model fails because it effectively speaking keyboard functions which wastes the majority of the power of a good grammar in a speech recognition environment.

Most common configuration for speech recognition in a virtualized environment today is that Windows is the host with speech recognition and Linux is the guest. It's just a reimplementation of the open-loop system described above where your dictation results are keystrokes injected into the virtual machine console window. Sometimes works, sometimes drops characters.

One big failing of the Windows host/Linux guest environments is in addition to dropping characters,it seems to drop segments of the audio stream on the Windows side. It's common but not frequent for this to happen anyway when running Windows with any sort of CPU utilization but it's almost guaranteed as soon as a virtual machine starts up.

Another failing is that the context the recognition application is aware of is the window of the console. It knows nothing about the internal context of the virtual machine (what application has focus). And unfortunately it can't know anything more because of the way that NaturallySpeaking uses the local Windows context.

Inverting the relationship between guest and host where Linux is the host and Windows is the guest solves at least the focus problem. In the virtual machine, you have a portal application the canal control the perception of context and tunnels the character stream from the recognition engine into the host OS to drive it open loop. The portal application[1] can also communicate which grammar sequence has been parsed and what action should be taken on the host site. At this point, we now have the capabilities of a closed-loop speech recognition environment where a grammar can read context to generate a new grammar to fit the applications state. This means smaller utterances which can be disambiguated versus the more traditional large utterance disambiguation technique.

A couple other advantages of Windows as a guest is that it only run speech recognition in the portal. There's no browsers, no flash, JavaScript, viruses and other "stuff" taking up resources and distracting from speech recognition working as well as possible. The downside is that the host running the virtual machine needs to make the VM very high almost real-time priority[2] so that it doesn't stall and speech recognition works as quickly and as accurately as possible.

Hope I didn't bore you too badly. Thank you for reading and I hope we can make this work.
--- eric



[1] should I call it cake?
[2]  I'm looking at you Firefox, sucking down 30% of the CPU doing nothing
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to