Re: mbstoupper or utf8toupper

Antoine Leca Tue, 11 Jan 2005 04:36:21 -0800

On Tuesday, January 11th, 2005 02:14Z Thomas Wolff va escriure:

> Antoine Leca wrote:
>> And yes, in UTF-8 the resulting size varies.
> Well, the number of bytes, yes, but that doesn't affect the
> definitions of lower/upper transformation.


In a way yes it does.
In C UTF-8 (subject of this mailing list unless I am mistaken) are stored in
char, which are 8-bit wide. So when using UTF-8, you are essentially using a
varying-length encoding (if you stick to char); and part of the C library
does not scale very well with more-than-two byte character sets (as I
unfortunately showed).

Of course, using UTF-32 (that is, towlower/towupper) escapes the problem (as
Michael explained when he opened the thread); but then you should be
prepared to variation of the length of the result, when stored back into
UTF-8 chars; which was Michael's preoccupation as I understood it.


>> Even more, tolower('I') will return 'i' under "normal" locales, but
>> (according to the standard) should return 'I' under a Turk locale!

I shall agree with you this is irrelevant to the discussion and I should
have avoided to bring it into the thread, since it only adds confusion, as
you noted.
I am sorry for that.


Antoine


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: mbstoupper or utf8toupper

Reply via email to