FdC> OK, let's go ahead and try to solve specific problems as they
FdC> come up.  For example, can UTF-8 be used in the 7-bit
FdC> environment? :-)

Easy.  A UTF-8 character is represented either by a GL code, in which
case it can represent itself, or by a sequence

  x_1 ... x_k

where the x_i are eight-bit codes with the high bit set.  Such a
character can be represented by

  SO x'_1 ... x'_k SI

where the x'_i are the x_i with the high bit stripped.

It looks like I've been hacking at ISO 2022 too much lately.

(Of course, the only reasonable thing to do in a 7-bit environment is
to base-65 the whole stream.  I think you need a pad character to make
that work.)

                                        Juliusz



-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to