On Mon, 6 Aug 2001, Jim Breen wrote:
> I find that amazing. I have used kterm for about 10 years with editors
> like jvim and jstevie, not to mention non-Unix editors like Word, etc.
> with Japanese. In all cases backspace erases the *whole* character; not
> just half it. I have been working with Japanese people editing text, and
> in all cases they obviously expected backspace and del to remove a
> complete character.

I was talking about the protocol between the editor and the terminal, you
talk about the protocol between the keyboard user and the editor.

   user -------> editor -------> terminal
        keyboard        tty pipe

The two are very different here, though they both use the same encoding,
namely UTF-8 and C0 control characters.  Jvim and friends add additional
logic to translate a single pressed backspace on the keyboard into two
backspaces that get sent to the terminal (if they use backspace at all and
not coordinate-based positioning).

> On rare occasions I have found myself using vanilla (un-Japonified) vi
> on some text containing Japanese text. In that case, it takes two presses
> of backspace to remove a kanji or kana, but that is very much the
> exception.

It would be nice, if backspace removed one character in editors that echo
every keybord byte to the terminal, as primitive as the kernel tty cooked
mode editor that you see when you type text into the "cat" command.

> Well, I expect both BS and DEL to do that. No new sequence is needed.

I never understood, what DEL is supposed to do today. This 7-bit ASCII
all-1 code byte was historically (1960s!) introduced to delete characters
from punch cards/tapes by punching out every hole (which a punch card
reader will ignore by convention), but it has completely lost its reason
for existance in video terminals. ISO 2022 seems to treat it as some
strange kind of thing, neither a control nor a graphic character, just
like space. A charming reminder of how well our current conventions are
still rooted in the needs of completely obsolete archaic technology.

The original historic purpose of both BS and CR in ANSI X3.4 and ECMA-6
was to allow extension of the glyph repertoire available on paper printer
by overstriking (accents, underlining, &c.).

  http://www.ecma.ch/ecma1/STAND/ECMA-006.HTM

ISO 8859 and ISO 10646 banned that use of BS and CR, which never worked on
video terminals anyway. There never was any formal standard apart from
manufacturer's practice on what codes function keys on terminals should
produce, which is why we ended up with quite some confusion on BS versus
DEL for the erase-character-to-left key and with two keys named Backspace
and Del on the usual keyboards. A related mess is that thanks to the ESC
key, the byte sequences associated with keys do not even form a uniquely
decodeable prefix code.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to