Re: [chromium-dev] Keyboard codes to use for a chrome extension api that enables automation?

Simon Stewart Thu, 14 Jan 2010 18:16:28 -0800

On Thu, Jan 14, 2010 at 4:15 PM, Dominic Mazzoni <dmazz...@google.com> wrote:
> On Thu, Jan 14, 2010 at 8:01 AM, Simon Stewart
> <simon.m.stew...@gmail.com> wrote:
>> The easiest thing for someone who's attempting to use the
>> accessibility API may be to avoid using keycodes, and instead allow
>> the input of the desired displayed value. The advantage of this would
>> be to allow the input of internationalized characters that would
>> otherwise need IME to be input.
>
> While this could be really useful for international characters, the
> main use I had in mind was "action" keystrokes, like tab, arrow keys,
> enter, Ctrl-+, etc. - so for these I think a keycode is needed, right?


The keycode is essentially an indication of which physical key was
pressed on the keyboard, not which letter the user means to input
(where the charcode may be more useful) Even on a standard US
keyboard, it's possible to get the same letter via different keycodes
just by changing the layout used. Throw international keyboards (such
as my UK one, or my colleagues German ones) and you're in a whole
world of pain and discomfort. Add choices such as mapping the caps
lock key to "ctl" and the picture becomes muddier still. Worse, unless
you provide constants, not many people know which keycode "normally"
maps to which key.

Which is a painfully long way of saying that I really think that the
keycode is a poor choice.

Fortunately, there are at least two pieces of prior art here:

1) MS's "SendInput" function, which takes a formatted string

2) WebDriver's mechanism for performing similar "action" keystrokes.

I wrote WebDriver's implementation in order to allow testing of a
complex application which demanded high fidelity simulation of user
input. This seems to broadly overlap your stated goal, so here's the
gist of what we do:

* All keyboard input is modeled as normal strings
* In the simple case of a lower-case string, we do a direct mapping
from character to keystroke
* In the case where a capital letter is shown, we automatically
simulate holding down the shift key
* The same applies to any character on the current keyboard layout
which can be typed by holding down the shift key

* We have defined constants in a unicode PUA for meta characters, such
as control, shift, tab, escape and so on.
* Meta characters are assumed to be "sticky", that is, if we see the
"ctl" character, we assume that the control key is held down until we
see another "ctl" character (where we release the key) or the end of
the string or a "release all meta keys" character is seen, in which
case all meta keys are released in the reverse order they were held
down in (so most recently pressed is released first)
* For characters which have well known mappings in strings ("\n",
"\t") we simulate using the expected key

The advantage of this scheme is that the common case (of a user simply
typing into a textarea or input element) is easy to read and makes
sense to someone not familiar with the test or the API. The more
complex case of a user wanting to simulate pressing a normal keyboard
shortcut is also handled intuitively because of the behaviour seen
when we reach the end of a string. Finally, it is possible to encode
relatively complex chording and interactions using this mechanism.

Example usages of this API can be seen here:

http://selenium.googlecode.com/svn/trunk/common/test/js/typing_test.html

FWIW, webdriver's API is already available for Firefox via JS, and
there is already work to get this working with Chrome. I'd be happy to
provide more information.

Regards,

Simon

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev

Re: [chromium-dev] Keyboard codes to use for a chrome extension api that enables automation?

Reply via email to