Paul LeoNerd Evans wrote on 2005-06-22 19:15 UTC: > I also have another issue I'd appreciate some thoughts on. It relates > to the way terminals send codes for key-presses. I have managed to find > a setup for xterm, which sends real UTF-8 encoded characters as 8-bit > values, but meta-keys using the ECMA-35 escape mechanism (e.g. Meta-m > being ESC m). This is technically cheating, relying on what ECMA-35 says > should be two equivalent encodings, to have two different meanings. Bash > (well, readline) is happy with this, but other programs (e.g. vim) see > them both identically; either I can type UTF-8, or meta-keys, but not > both. > > I have a possible solution to this, based on how Thomas Dickey changed > the way xterm sends modified cursor keys. Now, a normal right-arrow > sends CSI C, but control+right arrow sends CSI 1;5C. This uses the > second field of the ECMA-35 numeric parameters to encode the shift > state, in a +1 representation of a bitmask. 1 = normal, 2 = shift, 3 = > alt, 4 = shift + alt, etc... I was thinking that I could hijack this > mechanism, and use either a new code altogether, or use the ~ function > key code, as a modifier for the next normal character sent. So, where > > CSI 2~ == Insert > CSI 2;3~ == Alt+Insert > > I could then use a special sequence + modifier to represent any modified > key. I'm not sure quite if the spec allows 0 with ~, but if not I could > use a new code. I would send e.g. > > CSI 0;3~ d == Alt+d > > This also sits cleanly with UTF-8, so if anyone could type it, we could > represent Alt+é as > > CSI 0;3~ é > > There do become some other interesting related issues here; for > instance, we could now represent Ctrl+R as > > CSI 0;5~ r > > This now makes it possible to distinguish Ctrl+Shift+R by simply sending > > CSI 0;5~ R > > Though, this does start to bring up backward-incompatible issues with > programs expecting the older encoding for Ctrl+R (being the ASCII code > for R, bitwise ANDed with 0x1f). Perhaps the new scheme would send an > older code if it is properly representable like that, and a newer one if > not. > > Finally, I realise this new scheme would not be fully > backward-compatible, and would require changes in the terminal input > layer of any program I wished to use with it. That said, current schemes > involving encodings of the HOME/END keys, etc... are not quite > intercompatible, nor does Thomas Dickey's new modified cursor scheme yet > actually work in most programs. So since these issues would need fixing > anyway, I feel it is an appropriate moment to go on and fix the whole > problem properly. > > Specifically, I note that pressing (e.g.) Ctrl+Left in bash/readline, > results in bash thinking it read ;5D from the user, and prints as if the > keys ";", "5" and "D" had been typed in succession. From what I > understand of ECMA-35, I believe this to be a bug in readline (or > ncurses), and it should in fact be read as a left arrow, albeit ignoring > the Ctrl+ part.
Interesting idea. It is clear that in the UTF-8 age, the old "Alt" interpretation as "add 128 any character entered" is no longer applicable and sustainable. We need something new from scratch. A valid CSI sequence that gets prefixed to any key to signal the currently pressed modifiers (in case there is no other established way of interpreting that modifier combination) seems a rather good and clean approach to me. I don't believe there is any need to add +1 to the bitmask, as the digit '0' is a perfectly valid parameter byte in any ECMA-48 control sequence. I have to check the standard and some implementations first before I can comment on whether using ~ as the final byte for this is really the best choice, but I like the general idea in principle of standardising a control sequence for signalling modifier status. http://www.ecma-international.org/publications/standards/Ecma-048.htm Markus (Cc-ed to linux-utf8, where there have previously been discussions on this.) -- Markus Kuhn, Computer Laboratory, University of Cambridge http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
