Juliusz Chroboczek <[EMAIL PROTECTED]> wrote:

> There are two issues here.  One is what should happen in a UTF-8
> terminal emulator.  The other is whether it is okay to make an ISO
> 2022 terminal emulator generate UTF-8.
> 
> Please recall the framework -- we're working in a world where terminal
> emulators are the primary way of interacting with the system.  Some of
> us want this world to go UTF-8 -- and in order to do that, we're
> willing to modify not only the terminal emulators, but also the
> applications and the terminal driver.
>
Yes, of course if you modify the terminal driver to decode UTF-8 before
handling control characters, that solves the problem.

However, the original goal of UTF-8 was to be able to use it "behind the
back" of UNIX, just as we do now with ISO 8859.  But because UTF-8 is not
C1-safe, this idea breaks down in the terminal-to-host direction, and now
terminal drivers must be modified.  Worse: they probably will have to
support two modes: UTF-8 and "traditional", and this distinction must be
somehow coordinated with all their other modes: cooked, half-cooked, raw,
and so on.

> On the other hand, I do agree with you that an ISO 2022 emulator
> should not emit UTF-8, for the very reasons that you list (but also
> because I don't want to promote ISO 2022 -- if you want reliable
> keyboard input, use UTF-8.)
> 
It's not only a question of ISO 2022.  Leaving all notions of character-set
designation and invocation aside, we still must consider ISO 6429 and 4875.
Most UNIXes, going back to the very beginning, will treat any incoming
0x80-0x9F value as a control character and not a graphic, and this is
proper.  The very design of UTF-8 is flawed because the people who designed
it did not consider the C1 controls.

Prior to UTF-8, it was possible to both display and type "special"
characters on any UNIX or other 8-bit clean host.  The terminal type and
the character set (if its structure followed the standards) were independent
and could be mixed and matched as desired.  The terminal driver needed no
changes.

It would have been possible to design a C1-safe UTF-8 (and in fact is has
been done as a proof-of-concept), but it's too late now.  So the real
question is: can UTF-8 and ISO 4873 (which has specified the very structure
of coded character sets for 30 years) coexist without special assistance
from the terminal driver?  No.

Therefore with present-day terminal drivers, if you want reliable keyboard
input, *don't* use UTF-8!  And yet, obviously the emulator must emit UTF-8
if we are to benefit from Unicode.  Managing the transition is not going to
be easy.  Already we have had to modify terminal emulators to decode
incoming UTF-8 prior to looking for control characters or escape sequences
(backwards from normal and something I'm still not quite happy about).
Sending UTF-8 from the emulator to the host requires the host do the same
thing.  But a terminal emulator that works with a thus-modified host won't
work with traditional hosts.

It's an interesting problem.

- Frank

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to