Robert Michel wrote: > Salve Richard! >> I like your ideas here, > ;) > >> it definitely looks feasible to support a >> small subset of voice-commands.. "Yes/No/Again/Next" could be >> standardised and available to all applications. > > AFAIK will a small number of different voice commands have a good > regognize rate. >
IBM released a modified multimodal Opera web browser for the older-style Zaurus (Embedix linux) that supports voice interaction tags - "Websphere Everyplace Multimodal Environment". I've played around with it, and it works pretty well. By using XML (XHTML plus VoiceXML, actually) and defining limited-domain voice tags within a document it can distinguish spoken numbers, names, pizza toppings, etc without training. The engine should be able to handle a screenful of 9-16 icons by name plus basic menus, for example. As long as each item consists of a distinct series of phonemes it's smooth. (it doesn't need to hear the difference between 'whiter' and 'writer' - it's not speech-to-text) I for one find 'voice tags' on my cells to have been irritating, but have always wanted to be able to just recite a number and store or dial, or fire up the calculator and run some calculations, without pressing buttons or navigating menus. Between that and FLite (Festival Lite speech synthesis engine, available for the Zaurus and various ARM-Linux distros) you have the underpinnings of some very interesting possibilities. j _______________________________________________ OpenMoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/cgi-bin/mailman/listinfo/community