Re: strcoll for utf-8

H. Peter Anvin Mon, 07 Jan 2002 14:20:19 -0800

Followup to:  <[EMAIL PROTECTED]>
By author:    Paul Michel <[EMAIL PROTECTED]>
In newsgroup: linux.utf8
> 
> After reading a past discussion related to utf-8
> support in glibc 2.2, I was not sure of the conclusion
> regarding strcoll.
> I understood that all char functions work on bytes.
> None of them handle utf-8 in the sense that all these
> functions do not recognise any utf-8 encoded
> character, but only bytes. Now depending on what kind
> of processing they actually do, they can correctly
> handle utf-8 data (e.g. strcpy).
> 
> IMHO, strcoll cannot correctly handle utf-8 encoded
> characters since collation need explicit knowledge of
> characters. For instance, collation rules for Finnish
> are particular regarding some letters that are encoded
> on more than one byte in utf-8(e.g. �, xC3B6 in
> utf-8).
>


Since strcoll() assigns meanings to strings, it would obviously need
to decode the UTF-8 characters; except, of course, in the "C" locale
(where sorting is defined to be in binary order) since UTF-8 binary
order is identical to Unicode binary order (fortunately... it would be
very confusing to know what the "C" locale should do, otherwise.)

        -hpa
-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt    <[EMAIL PROTECTED]>
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: strcoll for utf-8

Reply via email to