FdC> OK, let's go ahead and try to solve specific problems as they
FdC> come up. For example, can UTF-8 be used in the 7-bit
FdC> environment? :-)
Easy. A UTF-8 character is represented either by a GL code, in which
case it can represent itself, or by a sequence
x_1 ... x_k
where the x_i are eight-bit codes with the high bit set. Such a
character can be represented by
SO x'_1 ... x'_k SI
where the x'_i are the x_i with the high bit stripped.
It looks like I've been hacking at ISO 2022 too much lately.
(Of course, the only reasonable thing to do in a 7-bit environment is
to base-65 the whole stream. I think you need a pad character to make
that work.)
Juliusz
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/