Re: Squeeze one more bit into a UTF-8 sequence?

Markus Kuhn Sun, 20 Jan 2002 06:53:41 -0800

Michael B Allen wrote on 2002-01-20 06:51 UTC:
> Is there a way to encode /one more bit/ of information into a UTF-8
> sequence in a way that is mostly orthogonal to the encoding itself?


Than it wouldn't be UTF-8 any more.

What you could do is either prefix each release with some release
indicator symbol, or add for instance 0x200000 to a Unicode character to
turn it into a release code. Both approaches allow you to use a normal
UTF-8 decoder at the receiver's end.

There is no standard for what you want to do, as this is getting very
far away from the classic VT100 / ISO 6429 terminal semantics. No matter
what you do, it will be your private encoding that isn't compatible with
anything else.

Make sure that the ESC sequence that you use to activate this private
mark/break mode is as long and obscure as possible (at least 10 bytes,
but still within the ECMA-48 syntax for ESC sequences!), to minimize
that it can ever be sent by accident to the terminal.

http://www.ecma.ch/ecma1/STAND/ECMA-048.HTM

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Squeeze one more bit into a UTF-8 sequence?

Reply via email to