Tomohiro KUBOTA wrote on 2001-02-03 01:27 UTC:
> If ISO-2022 escape sequences were be recognized in every modes,
> xterm -8, -u8, and -lc. If so, even if you invoked xterm with
> wrong mode, ISO-2022 string will be displayed correctly. Thus,
> application softwares can use ISO-2022 for very important messages.
Interesting twist of previously discussed arguments. I saw ISO 2022 so
far mostly as the primary *cause* of mojibake (English: "garbled text").
People occasionally dump binary data accidentally onto the Linux
console, and the /dev/console driver spots in there occasionally ISO
2022 or ad-hoc Linux ESC sequences to switch GL, GR, G0 or G1, and as a
result even ASCII text is rendered as DEC or CP437 block graphics
characters. The Japanese word mojibake sounds like an adequate
description for that effect. Xterm is somewhat less vulnerable because
it implements less charset switching functionality than the console, but
the problem exists there in principle as well. A common countermeasure
is to place a long list of sanity ESC sequences to restore state into
the Unix shell prompts.
As a result of such experiences, we discussed here a year or two ago
whether to embed UTF-8 into an ISO 2022 switching mechanism (ESC % G to
activate UTF-8 and ESC % @ to deactivate it, see
http://www.cl.cam.ac.uk/~mgk25/unicode.html#term
for a discussion of the formal details and references). If I remember
correctly, everyone seemed to have favoured to design "xterm -u8" such
that it permanently locks the terminal into UTF-8 and thus eliminates
the ISO 2022 mojibake hazards. It was though that this should eventually
significantly cut down the "Help, my terminal shows crazy characters!"
helpline calls. I still think it is a good idea to encourage people to
use a UTF-8 mode in which ISO 2022 ESC sequences are ignored.
The so-called ISO-2022-JP is probably less susceptible to mojibake than
proper ISO 2022, because ISO-2022-JP output traditionally resets the
ISO 2022 coding state of the terminal at the start of every new line
from scratch. Seems to work in practice, but is far from elegant.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/