On Wed, Nov 08, 2000 at 12:32:49PM +0100, Karlsson Kent - keka wrote:
>
>
> > -----Original Message-----
> > From: Roozbeh Pournader [mailto:[EMAIL PROTECTED]]
> > Sent: Wednesday, November 08, 2000 11:58 AM
> > To: Linux i18n
> > Subject: Sorting and combining diacritical marks
> >
> >
> >
> > A quick look at ISO 14651 tables
> > http://www.iso.ch/ittf/ISO14651_2000_TABLE1.htm, yields that there's a
> > U+0308 (COMBINING DIAERESIS) is in the table. But it is
> > excluded in the
> > "iso14651_t1" file in glibc. Does it mean that the programmer should
> > normalize the strings before the sorting, or the umlaut will
> > be ignored?
>
> I just took a quick look at iso14651_t1. It seems to be extremely
> old, and very limited. Not at all like the "2000" table, which
> 1) covers all of Unicode 2.1, and 2) does handle combining diacritics.
> I guess Keld might know the details as to why the "t1" table has
> not been updated.
Ulrich has taken the table from an older draft, where
uppercase and lowercase had been separated. This facilitates
that regular expressions like [a-c]* does not address
initial uppercase letter, which could cause surprises for
users eg deleting files and not expecting filse with initial
uppercase letters to be deleted.
Keld
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/