> FdC> OK, let's go ahead and try to solve specific problems as they
> FdC> come up. For example, can UTF-8 be used in the 7-bit
> FdC> environment? :-)
>
> Easy. A UTF-8 character is represented either by a GL code, in which
> case it can represent itself, or by a sequence
>
> x_1 ... x_k
>
> where the x_i are eight-bit codes with the high bit set. Such a
> character can be represented by
>
> SO x'_1 ... x'_k SI
>
> where the x'_i are the x_i with the high bit stripped.
>
It was a trick question. SO/SI only work on characters in the G1 range.
C1 != G1. If it worked on C1 characters, then how would you tell the
difference between shifted and unshifted SI?
- Frank
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/