On Wed, Jan 05, 2005 at 01:33:44AM -0500, Michael B Allen wrote: > I know you guys are off on your own tangent here but I would just like to > reiterate that when I asked Henry "Are these combinations common in > usernames or pathnames?" I was referring specifically to only characters for > which the upper and lower case codes for a character are encoded in UTF-8 > with a different number of bytes.
Turkish has i with dot and i without dot, and unsurprisingly the upper case of dotted i is dotted I, the lower case of dotless I is dotless i. Now dotted i and dotless I are in the ASCII range (single UTF-8 byte), while dotless i is U+0131, dotted I is U+0130. Both take two bytes. These are common vowels. Andries -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
