Re: Linux UTF-8 locales sort SPACE at level 4

Denis Barbier Tue, 21 Mar 2006 16:11:21 -0800

On Tue, Mar 21, 2006 at 06:46:45PM +0000, Markus Kuhn wrote:
[...]
> References:
> 
>   - Unicode Collation Algorithm (UCA), http://www.unicode.org/reports/tr10/
> 
>   - ISO TR 14652 (draft: 
> http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-14652.pdf)


ISO TR 14652 does not deal with collation, GNU libc locales are based on ISO 
14651.
A draft is available at http://dkuug.dk/jtc1/sc22/open/n2933.pdf
Iso14651_t1 is intended to be the Common Template Table defined in appendix A.

>   - http://sources.redhat.com/bugzilla/show_bug.cgi?id=374

This bugreport does not contain any information.  OTOH
      http://sources.redhat.com/bugzilla/show_bug.cgi?id=388
explains that current sorting order in wrong in Polish.

>   - https://bugzilla.novell.com/show_bug.cgi?id=152778

Access denied.

> Example:
> 
> $ cat >demo.txt
> death
> de luge
> de-luge
> deluge
> de-luge
> de Luge
> de-Luge
> deLuge
> de-Luge
> demark
> ^D
> 
> and then try
> 
> $ LC_COLLATE=C            sort demo.txt
> $ LC_COLLATE=en_GTB.UTF-8 sort demo.txt
> $ LC_COLLATE=en_GB        sort demo.txt

Out of curiosity, do you see differences between en_GB and en_GTB.UTF-8?
There should be none.

> and see the difference with how your dictionary or phone book sorts
> these.

My understanding is that authors of ISO 14651 tried to gather some
general rules which are relevant for several locales, and other locales
have to derive from these rules if needed.  The problem is that very
few people submitted changes, and as can be seen above, it is sometimes
hard to push changes into GNU libc.  But at least this is an open
process, other distributions can make up their mind and include the
requested changes if they want.

Denis

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Linux UTF-8 locales sort SPACE at level 4

Reply via email to