Carl W. Brown writes:

> But UTF-8 is not without its own problems.  Take Oracle for example.

Most of the world is not Oracle. If Oracle uses its own encodings, let
Oracle deal with it.

> They designed UTF-8 to encode UCS-2 not UTF-16.

No, Oracle did not design UTF-8 at all. The RFC 2279 specifies UTF-8,
and it encodes all characters from U+00000000 to U+7FFFFFFF.

> I am not familiar with libiconv.

ftp://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.7.tar.gz

> ICU has an invalid character callback handler.  I use it for example to
> convert characters that are not in the code page to HTML/XML escape
> sequences.

You can do that with iconv() as well. With iconv(), the processing
simply stops at an invalid/unconvertible character, and the programmer
can do any kind of error handling before restarting the conversion.

> Looking at the iconv() I did not see any provisions for special invalid
> character handling.  Do you have this kind of support in libiconv?

Sure. It is even built-in.

Bruno
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to