Carl W. Brown <[EMAIL PROTECTED]>:

> If they validate UTF-8 (xiua_ValidateStr) it will check each character to be
> a valid UTF-8 initial character followed by the proper number of
> continuation characters if any.  It will make sure that it is not a
> surrogate character nor a reversed BOM nor exceed the Unicode 3.1 character
> range.

Note also that "\xe0\x84\x80" is illegal, for example, as U+0100
should be represented only by "\xc4\x80".

Perhaps you want to exclude U+FFFF, too.

Edmund
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to