Bram Moolenaar wrote on 2001-02-16 11:30 UTC:
> When the termcap file is in UTF-8
> format, this then contains the four bytes 0xc3 0x8c 0xc2 0x9b.
Good lord, no. You have just invented UTF-64! The above byte sequence is
what you get if you take CSI in some ISO 4873 conforming encoding (0x9B)
and send it twice (!) through for instance an ISO 8859-1 -> UTF-8
converter:
9b -> c2 9b -> c3 82 c2 9b -> c3 83 c2 82 c3 82 c2 9b -> ...
| | | |
| UTF-8 | UTF-8^3 = "UTF-512"
| |
ISO 8859 UTF-8^2 = "UTF-64"
Whenever you find someone producing such byte sequences, you have seen
the fallout of a serious intellectual accident.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/