Paul LeoNerd Evans wrote on 2005-06-22 19:15 UTC:
> I also have another issue I'd appreciate some thoughts on. It relates
> to the way terminals send codes for key-presses. I have managed to find
> a setup for xterm, which sends real UTF-8 encoded characters as 8-bit
> values, but meta-keys using the ECMA-35 escape mechanism (e.g. Meta-m
> being ESC m). This is technically cheating, relying on what ECMA-35 says
> should be two equivalent encodings, to have two different meanings. Bash
> (well, readline) is happy with this, but other programs (e.g. vim) see
> them both identically; either I can type UTF-8, or meta-keys, but not
> both.
> 
> I have a possible solution to this, based on how Thomas Dickey changed
> the way xterm sends modified cursor keys. Now, a normal right-arrow
> sends CSI C, but control+right arrow sends CSI 1;5C. This uses the
> second field of the ECMA-35 numeric parameters to encode the shift
> state, in a +1 representation of a bitmask. 1 = normal, 2 = shift, 3 =
> alt, 4 = shift + alt, etc... I was thinking that I could hijack this
> mechanism, and use either a new code altogether, or use the ~ function
> key code, as a modifier for the next normal character sent. So, where
> 
>   CSI 2~ == Insert
>   CSI 2;3~ == Alt+Insert
> 
> I could then use a special sequence + modifier to represent any modified
> key. I'm not sure quite if the spec allows 0 with ~, but if not I could
> use a new code. I would send e.g.
> 
>   CSI 0;3~ d == Alt+d
> 
> This also sits cleanly with UTF-8, so if anyone could type it, we could
> represent Alt+é as
> 
>   CSI 0;3~ é
> 
> There do become some other interesting related issues here; for
> instance, we could now represent Ctrl+R as
> 
>   CSI 0;5~ r
> 
> This now makes it possible to distinguish Ctrl+Shift+R by simply sending
> 
>   CSI 0;5~ R
> 
> Though, this does start to bring up backward-incompatible issues with
> programs expecting the older encoding for Ctrl+R (being the ASCII code
> for R, bitwise ANDed with 0x1f). Perhaps the new scheme would send an
> older code if it is properly representable like that, and a newer one if
> not.
>
> Finally, I realise this new scheme would not be fully
> backward-compatible, and would require changes in the terminal input
> layer of any program I wished to use with it. That said, current schemes
> involving encodings of the HOME/END keys, etc... are not quite
> intercompatible, nor does Thomas Dickey's new modified cursor scheme yet
> actually work in most programs. So since these issues would need fixing
> anyway, I feel it is an appropriate moment to go on and fix the whole
> problem properly.
> 
> Specifically, I note that pressing (e.g.) Ctrl+Left in bash/readline,
> results in bash thinking it read ;5D from the user, and prints as if the
> keys ";", "5" and "D" had been typed in succession. From what I
> understand of ECMA-35, I believe this to be a bug in readline (or
> ncurses), and it should in fact be read as a left arrow, albeit ignoring
> the Ctrl+ part.

Interesting idea. It is clear that in the UTF-8 age, the old "Alt"
interpretation as "add 128 any character entered" is no longer
applicable and sustainable. We need something new from scratch. A valid
CSI sequence that gets prefixed to any key to signal the currently
pressed modifiers (in case there is no other established way of
interpreting that modifier combination) seems a rather good and clean
approach to me.

I don't believe there is any need to add +1 to the bitmask, as the digit
'0' is a perfectly valid parameter byte in any ECMA-48 control sequence.
I have to check the standard and some implementations first before I can
comment on whether using ~ as the final byte for this is really the best
choice, but I like the general idea in principle of standardising a
control sequence for signalling modifier status.

http://www.ecma-international.org/publications/standards/Ecma-048.htm

Markus

(Cc-ed to linux-utf8, where there have previously been discussions on this.)

-- 
Markus Kuhn, Computer Laboratory, University of Cambridge
http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to