Carl W. Brown writes:
> But UTF-8 is not without its own problems. Take Oracle for example.
Most of the world is not Oracle. If Oracle uses its own encodings, let
Oracle deal with it.
> They designed UTF-8 to encode UCS-2 not UTF-16.
No, Oracle did not design UTF-8 at all. The RFC 2279 specifies UTF-8,
and it encodes all characters from U+00000000 to U+7FFFFFFF.
> I am not familiar with libiconv.
ftp://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.7.tar.gz
> ICU has an invalid character callback handler. I use it for example to
> convert characters that are not in the code page to HTML/XML escape
> sequences.
You can do that with iconv() as well. With iconv(), the processing
simply stops at an invalid/unconvertible character, and the programmer
can do any kind of error handling before restarting the conversion.
> Looking at the iconv() I did not see any provisions for special invalid
> character handling. Do you have this kind of support in libiconv?
Sure. It is even built-in.
Bruno
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/