Followup to: <[EMAIL PROTECTED]>
By author: Juliusz Chroboczek <[EMAIL PROTECTED]>
In newsgroup: linux.utf8
>
> FdC> OK, let's go ahead and try to solve specific problems as they
> FdC> come up. For example, can UTF-8 be used in the 7-bit
> FdC> environment? :-)
>
> Easy. A UTF-8 character is represented either by a GL code, in which
> case it can represent itself, or by a sequence
>
> x_1 ... x_k
>
> where the x_i are eight-bit codes with the high bit set. Such a
> character can be represented by
>
> SO x'_1 ... x'_k SI
>
> where the x'_i are the x_i with the high bit stripped.
>
> It looks like I've been hacking at ISO 2022 too much lately.
>
> (Of course, the only reasonable thing to do in a 7-bit environment is
> to base-65 the whole stream. I think you need a pad character to make
> that work.)
>
If you're using ISO 2022-style escaping, you also need to put ESC in
front of any C1-turned-C0 character except the final SI. Yuck.
If you're in a 7-bit environment:
a) Upgrade, if you can. This is the 21st century, guys.
b) Use UTF-7.
c) For the specific case of email, which usually is 8 bit even though
the standards specify that plain SMTP is 7 bit for historical
reasons, use base64 or quoted-unprintable.
-hpa
--
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/