Markus wrote:
> Frank da Cruz wrote on 2001-05-30 16:39 UTC:
> > Yes, of course if you modify the terminal driver to decode UTF-8 before
> > handling control characters, that solves the problem.
> > 
> > However, the original goal of UTF-8 was to be able to use it "behind the
> > back" of UNIX, just as we do now with ISO 8859.  But because UTF-8 is not
> > C1-safe, this idea breaks down in the terminal-to-host direction, and now
> > terminal drivers must be modified.
> 
> I am not aware of any common practice to assign special semantics to C1
> control characters in Unix tty.
>
Some UNIX tty drivers treat C1 controls as the corresponding C0 controls
(i.e. they ignore the 8th bit).  I tried this just now by sending \x83 to
SunOS 4.1 (old I know) and that's exactly what happened (it got a SIGINT),
even though have "stty pass8".  This is a characteristic of 4.2/4.3BSD-derived
UNIXes.  Ditto in Ultrix 3.0, Xenix 2.3.4, and 4.2/4.3BSD itself.  Ditto in
HP-UX 10.20 (even though it is strict SVR4).  But not in SINIX 5.42 (also
strict SVR4), nor in Solaris 2.x, nor in Linux (RH 5.2 and later), FreeBSD,
AIX, OSR5, Unixware, etc.  So yes, most of the newer UNIXes appear to be
"Microsoft compliant" by virtue of having accomodating themselves to
CP437-based consoles (graphics in C1 area).  I could do a more thorough 
census next time I do a C-Kermit "build-all".

> The only problem with UTF-8 that we have with ttys is that their
> "cooked" mode is a full-fledged editor that is not aware of *any*
> multi-byte character encodings, including UTF-8. My hope is that one of
> the next POSIX revisions will add a UTF-8 flag to struct termios, buyt I
> have no idea, whether that is already in the queue.
> 
This is going to be mighty complicated...  And it has to be done separately
for every UNIX variety.  Maybe we should forget about Linux and just start
using Plan 9 :-)

> I assume, the other "host communication" problems that you have referred
> to in a rather abstract way have to do with some DEC dinosaurs (VMS?),
> most of use really don't care about (because a cruel visionary had it
> reimplemented, crossed with Windows 3.1 and it's now called WinNT. :-)
> 
Well I would not be so quick to cast aspersions at well-thought-out,
consistent, proven, documented, and vigorously standards-compliant (if
overly vebose :-) operating systems, but now that you mention it, VMS (and
DEC in general) is/was indeed where you find the premiere implementations of
ISO 2022 and 6429.  In VMS, C1 controls are not simply C0 controls with
their 8th bits on.  Also, despite all efforts to kill it off, VMS is
actually growing in popularity, although practically nobody wants to talk
about it in public for image reasons.

But OK, let's step back a minute.  My little experiment suggests that the
Linux terminal driver might not need any changes after all.  Using my UTF-8
terminal emulator (Kermit 95), I can indeed type the "Latin-1" characters
A-grave, A-acute, A-tilde, and so forth, over a Telnet connection at the Red
Hat 7.0 bash prompt and have them echo correctly.  In particular A-tilde,
whose UTF-8 representation contains a 0x83 byte, does not cause a SIGINT.
Ditto in Solaris 8, AIX 4.3, and probably most of the other current UNIXes.

I can't do this with SunOS but I suppose it's not that important.  As much
as I like to support old OS's, the fact is that the recent wave of Internet
attacks, virus, worms, etc, have forced most of them to upgrade or shut down.

OK, let's go ahead and try to solve specific problems as they come up.  For
example, can UTF-8 be used in the 7-bit environment? :-)

- Frank

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to