Re: [I18n]Re: Li18nux Locale Name Guideline Public Review

Pablo Saratxaga Tue, 22 Jan 2002 03:23:54 -0800

Kaixo!

I thing you misunderstood one of my comments:

On Tue, Jan 22, 2002 at 11:11:20AM +0100, Bram Moolenaar wrote:

> > > Locale names are always in ASCII and thus are indifferent to the current
> > > locale.
> > 
> > You are wrong.
> > uppercase(i) will give different results in English or Turkish locale.
> > same for lowercase(I).
> > Maybe those are the only two cases involving ascii letters; but that also
> > mean that if the name of a Turkish locale includes an 'i' the result may be
> > that it won't work as expected.
> 
> I don't know the uppercase() or lowercase() function.

Or whatever they are named; I mean upercasing and lowercasing.

> It's very easy to
> change case without using the current locale, just using the ASCII
> characters.

But 'i' and 'I' *ARE* ascii characters; yet they behave in a locale dependent
way for upper casing and lowercasing.
ascii and "C" locale are not the same thing.

> I think it's obvious that the current locale should not
> apply to the locale name itself.

It is not obvious.
If the case insensitivness is implemented with standard functions, like
strcasecmp(), then the result will be dependent on the locale.
That means for example that setting  LANG=english while in "C"
will, work, then setting LANG=turkish while in english will work too;
but setting back LANG=english won't, as in Turkish 'i' and 'I' are not the
same letter with different case.

Try it; do a very simple program that does strcasecmp('i','I') and try it
in a Turkish locale.

> > The gains of having case insensitive locale naming are in practice
> > very small; and it may even introduce bugs and problems for Turkish
> > users (they already have too often problems due to the peculiar case
> > changing rules for dot and dotless i's).
> 
> If we treat the locale name as plain ASCII then this problem won't
> exist.

This problem has nothing to do with the charset encoding; it will be the
same even if you use EBCDIC.
The problem is not with the encoding, but with the notion of "case
insensitivness" itself. Simply the case pairs are not the same in all
locales; so, in order for it to work, you must implement a wrong case
insensitivness, not a real one, but one that is broken in Turkish locale;
creating user unfriendliness for users of Turkish locale.

You must take in account that problem any time you want to use case
insensitivity: case insensitivity is not an invariable thing, it is locale
dependent. As a general rule, anything dealing with chars (as opposed as
dealing with bytes), that is anything giving a human meaning to a mere
collection of bits, must be treated as locale dependent.
A lot of programs out there are bugged due to ignoring that problem,
that leads to case insensitivness not working in Turkish (in some cases
it may even cause segmentation faults), wrong charactert count and positioning
(as the programmer wrongly assumed a character will always be 1 column wide),
etc.

-- 
Ki �a vos v�ye b�n,
Pablo Saratxaga

http://www.srtxg.easynet.be/            PGP Key available, key ID: 0x8F0E4975

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: [I18n]Re: Li18nux Locale Name Guideline Public Review

Reply via email to