Re: mbstoupper or utf8toupper

Andries Brouwer Wed, 05 Jan 2005 01:31:16 -0800

On Wed, Jan 05, 2005 at 01:33:44AM -0500, Michael B Allen wrote:

> I know you guys are off on your own tangent here but I would just like to
> reiterate that when I asked Henry "Are these combinations common in
> usernames or pathnames?" I was referring specifically to only characters for
> which the upper and lower case codes for a character are encoded in UTF-8
> with a different number of bytes.


Turkish has i with dot and i without dot,
and unsurprisingly the upper case of dotted i is dotted I,
the lower case of dotless I is dotless i.
Now dotted i and dotless I are in the ASCII range (single UTF-8 byte),
while dotless i is U+0131, dotted I is U+0130. Both take two bytes.

These are common vowels.

Andries

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: mbstoupper or utf8toupper

Reply via email to