Eric Johansson, le lun. 18 mars 2019 15:50:23 -0400, a ecrit: > On 3/18/2019 2:46 PM, Samuel Thibault wrote: > >> Is there any capabilities in at-spi that allow a speech recognition > >> environment to query the application and find out enough context to be > >> able to generate appropriate grammars? for example, using a > >> multipurpose editor. I may want to have different grammars for different > >> types of data. I need to know what tab has focus and something about > >> that tab (TBD) so that I can generate the right grammar to operate on > >> data within that tab. > > At-spi provides information of which widget has focus, and then with > > at-spi you can inspect the list of actions. > > A speech driven environment only partially cares about focus.
With at-spi, from the widget which has focus, you can go around the whole tab. > But you remember you need to send an email message so you say "take a > message". That could be an action attached to a global shortcut, which could be inspected as well. > if you look at the screen and you see something you want to do you > should be all the say it That could also be inspected through at-spi. > >> There are a bunch of other types such as email address phone number URL > >> people and other limited board non-English language elements that could > >> be spoken. Each of which need their own grammar. > > The exact allowed grammer could be passed through at-spi. > > The grammar wouldn't be passed through at-spi. ? It has to be exposed somehow by the application. I don't think we want to enumerate in at-spi all kinds of input to be set in a field (number, e-mail, telephone number with all kinds of formats, etc.) while the application can just pass the grammar it already knows about. Or maybe we just don't mean the same thing by "grammar". I'm taking it in a computer-science way, for a number, the grammar would be [0-9]*. > One thing I've learned in and building speech interfaces is that the > grammar is incredibly personal. We then need to relate this with actual inputs in the applications. > To me the answer is giving the end user the ability to modify the > grammar to fit how they speak. How would they express it? > >> One of the problems though with the notion database is that there are no > >> row names except by convention. Therefore whenever you use a name to > >> sell, somebody needs to keep track of the role you are on and no > >> command should take you off of that row. > > I'm not sure to understand. > > Picture a spreadsheet. The spreadsheet has numbers across the top one > through infinity but on the side instead of the usual A-Z, there's > nothing. So when you operate on that row, you can only operate on > horizontally adjacent cells and not referred anything above or below. It doesn't work to use expressions such as "row below" or "row #2"? > >> The last issue is a method of bypassing JavaScript backed editors. I > >> cannot dictate into Google Docs, have difficulty dictating into > >> LinkedIn. In the browser context, only naked text areas seem to work > >> well with NaturallySpeaking. > > That is the kind of example where plugging only through at-spi can fail, > > when the application is not actually exposing an at-spi interface, and > > thus plugging at the OS level can be a useful option. > > But we still have the same problem of the speech environment needing to > reach into the application to understand the contents of buffers, what > can commands are accessible in the application context etc. Yes, but both ways can cooperate. > Think of it as a software endoscopy or colonoscopy. The problem is that there will always be software that doesn't allow it. So you also want a generic solution. Samuel _______________________________________________ gnome-accessibility-list mailing list gnome-accessibility-list@gnome.org https://mail.gnome.org/mailman/listinfo/gnome-accessibility-list