Michael B Allen wrote on 2002-01-20 06:51 UTC: > Is there a way to encode /one more bit/ of information into a UTF-8 > sequence in a way that is mostly orthogonal to the encoding itself?
Than it wouldn't be UTF-8 any more. What you could do is either prefix each release with some release indicator symbol, or add for instance 0x200000 to a Unicode character to turn it into a release code. Both approaches allow you to use a normal UTF-8 decoder at the receiver's end. There is no standard for what you want to do, as this is getting very far away from the classic VT100 / ISO 6429 terminal semantics. No matter what you do, it will be your private encoding that isn't compatible with anything else. Make sure that the ESC sequence that you use to activate this private mark/break mode is as long and obscure as possible (at least 10 bytes, but still within the ECMA-48 syntax for ESC sequences!), to minimize that it can ever be sent by accident to the terminal. http://www.ecma.ch/ecma1/STAND/ECMA-048.HTM Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
