Sun, 03 Sep 2000 16:12:09 +0100, Markus Kuhn <[EMAIL PROTECTED]> pisze:
> UTF-8-test.txt
Ah, helped to find a bug in my UTF-8 decoder for Haskell.
And showed me that iconv in glibc-2.1.3 sucks ("break" for a wrong loop
in UTF-8 decoder, does not try to detect many illegal sequences, gives
bad errno when UCS-4 encoder is given an odd-sized output buffer).
libiconv is better but sometimes returns more U+FFFD characters than
recommended there.
> It now contains an additional section 5 with UTF-8 sequences for
> illegal code positions that a good decoder should reject (surrogates,
> U+FFFE, U+FFFF) like overlong and malformed sequences for security
> reasons, as well as all the relevant legal boundary conditions
> for these.
Should they be rejected by decoders of other formats when applicable,
e.g. U+FFFF in UTF-16 or surrogates in UCS-4?
--
__("< Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
\__/
^^ SYGNATURA ZASTĘPCZA
QRCZAK
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/