Followup to: <[EMAIL PROTECTED]>
By author: Paul Michel <[EMAIL PROTECTED]>
In newsgroup: linux.utf8
>
> After reading a past discussion related to utf-8
> support in glibc 2.2, I was not sure of the conclusion
> regarding strcoll.
> I understood that all char functions work on bytes.
> None of them handle utf-8 in the sense that all these
> functions do not recognise any utf-8 encoded
> character, but only bytes. Now depending on what kind
> of processing they actually do, they can correctly
> handle utf-8 data (e.g. strcpy).
>
> IMHO, strcoll cannot correctly handle utf-8 encoded
> characters since collation need explicit knowledge of
> characters. For instance, collation rules for Finnish
> are particular regarding some letters that are encoded
> on more than one byte in utf-8(e.g. �, xC3B6 in
> utf-8).
>
Since strcoll() assigns meanings to strings, it would obviously need
to decode the UTF-8 characters; except, of course, in the "C" locale
(where sorting is defined to be in binary order) since UTF-8 binary
order is identical to Unicode binary order (fortunately... it would be
very confusing to know what the "C" locale should do, otherwise.)
-hpa
--
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[EMAIL PROTECTED]>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/