Re: mbstoupper or utf8toupper

Antoine Leca Mon, 10 Jan 2005 01:44:46 -0800

On Thursday, January 6th, 2005 15:57Z Michael B Allen va escriure:
>
> So you're saying if I do towlower(0x0130) (dotted I) in a Turkish
> locale I'll get 0x0069 (ASCII i)?


Unless I am missing something, this is the sensible behaviour under ANY
locale...

What is variable is the behaviour of towlower(L'I'), which should return
usually L'i' as you would expect, but would return L'\u0131' (dotless i, ==
0x0131 in Unicode locales) under a Turkish/Azeri/any_Turk_language locale.

And yes, in UTF-8 the resulting size varies.

Even more, tolower('I') will return 'i' under "normal" locales, but
(according to the standard) should return 'I' under a Turk locale! It is the
way the standard is written (C99 7.4.2.1; note this is changed from ANSI C
4.3.2.1)

    If the argument is a character for which /isupper/ is true
    and there are one or more corresponding characters, as
    specified by the current locale, for which /islower/ is true,
    the /tolower/ function returns one of the corresponding
    characters (always the same one for any given locale);
    otherwise, the argument is returned unchanged.

First part is true, but the second part is not, since the "corresponding
character" could only be dotless i, which is not a single-byte character in
a UTF-8 locale. As a result, the "otherwise" clause applies.

And yes, I find this quite surprising.


Antoine


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: mbstoupper or utf8toupper

Reply via email to