Re: mbstoupper or utf8toupper

Thomas Wolff Mon, 10 Jan 2005 18:17:11 -0800

Antoine Leca <[EMAIL PROTECTED]> wrote:
> On Thursday, January 6th, 2005 15:57Z Michael B Allen va escriure:
> >
> > So you're saying if I do towlower(0x0130) (dotted I) in a Turkish
> > locale I'll get 0x0069 (ASCII i)?
> 
> Unless I am missing something, this is the sensible behaviour under ANY
> locale...
> 
> What is variable is the behaviour of towlower(L'I'), which should return
> usually L'i' as you would expect, but would return L'\u0131' (dotless i, ==
> 0x0131 in Unicode locales) under a Turkish/Azeri/any_Turk_language locale.
Yes.


> And yes, in UTF-8 the resulting size varies.
Well, the number of bytes, yes, but that doesn't affect the definitions 
of lower/upper transformation.

> Even more, tolower('I') will return 'i' under "normal" locales, but
> (according to the standard) should return 'I' under a Turk locale! It is the
> way the standard is written (C99 7.4.2.1; note this is changed from ANSI C
> 4.3.2.1)
No, how do you come to this confusion?

>     If the argument is a character for which /isupper/ is true
>     and there are one or more corresponding characters, as
>     specified by the current locale, for which /islower/ is true,
>     the /tolower/ function returns one of the corresponding
>     characters (always the same one for any given locale);
>     otherwise, the argument is returned unchanged.
> 
> First part is true, but the second part is not, since the "corresponding
> character" could only be dotless i, which is not a single-byte character in
> a UTF-8 locale. As a result, the "otherwise" clause applies.
No, no, no. Why are you speaking about bytes here? This is totally 
irrelevant and the section you quoted doesn't mention any bytes either.
The lower of "I" in a Turkish locale is clearly dotless i.

> And yes, I find this quite surprising.
It would be. Fortunately, it's not.


Regards,
Thomas

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: mbstoupper or utf8toupper

Reply via email to