Re: Termcap and UTF-8 CSI

Markus Kuhn Fri, 16 Feb 2001 04:23:30 -0800

Bram Moolenaar wrote on 2001-02-16 11:30 UTC:
> When the termcap file is in UTF-8
> format, this then contains the four bytes 0xc3 0x8c 0xc2 0x9b.

Good lord, no. You have just invented UTF-64! The above byte sequence is
what you get if you take CSI in some ISO 4873 conforming encoding (0x9B)
and send it twice (!) through for instance an ISO 8859-1 -> UTF-8
converter:

  9b -> c2 9b -> c3 82 c2 9b -> c3 83 c2 82 c3 82 c2 9b -> ...

  |     |        |              |
  |     UTF-8    |              UTF-8^3 = "UTF-512"
  |              |
  ISO 8859       UTF-8^2 = "UTF-64"

Whenever you find someone producing such byte sequences, you have seen
the fallout of a serious intellectual accident.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Re: Termcap and UTF-8 CSI

Reply via email to