while im on the topic, looking through libunicode source: it doesnt seem to reject overcoded utf-8 sequences; guaranteed security holes and non standard compliant.
just a warning, though i have some source code lieing around which may be able to substitute for the offending bits. also: rather than using -1 as a tabla-rasa error return value, im leaning towards having a set of negative vals for distinguishing the problems: truncated sequence, invalid utf-8 byte, unexpected continuing character, so on and so forth... -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
