Followup to:  <[EMAIL PROTECTED]>
By author:    Juliusz Chroboczek <[EMAIL PROTECTED]>
In newsgroup: linux.utf8
>
> FdC> OK, let's go ahead and try to solve specific problems as they
> FdC> come up.  For example, can UTF-8 be used in the 7-bit
> FdC> environment? :-)
> 
> Easy.  A UTF-8 character is represented either by a GL code, in which
> case it can represent itself, or by a sequence
> 
>   x_1 ... x_k
> 
> where the x_i are eight-bit codes with the high bit set.  Such a
> character can be represented by
> 
>   SO x'_1 ... x'_k SI
> 
> where the x'_i are the x_i with the high bit stripped.
> 
> It looks like I've been hacking at ISO 2022 too much lately.
> 
> (Of course, the only reasonable thing to do in a 7-bit environment is
> to base-65 the whole stream.  I think you need a pad character to make
> that work.)
> 

If you're using ISO 2022-style escaping, you also need to put ESC in
front of any C1-turned-C0 character except the final SI.  Yuck.

If you're in a 7-bit environment:

a) Upgrade, if you can.  This is the 21st century, guys.

b) Use UTF-7.

c) For the specific case of email, which usually is 8 bit even though
   the standards specify that plain SMTP is 7 bit for historical
   reasons, use base64 or quoted-unprintable.

        -hpa

-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to