UTF-8B support for libiconv

Ben Wiley Sittler Sun, 02 Apr 2006 20:52:44 -0700

[ this is in response to a truly ancient linux-utf8 thread ]

i wrote a patch that provides UTF-8 + binary in one codec with no
hand-waving, using Markus Kuhn's brilliant proposal to encode invalid
bytes 0xyz using unpaired surrogates U+DCyz. this means there need not
be a text/binary distinction for UTF-8-using programs. legal UTF-8
decodes/encodes correctly, and other bytes are handled as "opaque"
U+DCxx on input and correctly serialized on output. so one can once
again consider editing a binary format with a "notepad"-type editor
without sacrificing internationalization support.


Markus Kuhn's description of the idea: (search for "option d")

http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html

the patch:

http://xent.com/~bsittler/libiconv-1.9.1-utf-8b.diff

enjoy! (not sure how/whether this fits into the official distro, but i
hope it gets used)

-ben

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

UTF-8B support for libiconv

Reply via email to