On Wed, Jan 05, 2005 at 10:57:58PM -0500, Michael B Allen wrote: > Andries Brouwer said: > > Turkish has i with dot and i without dot, > > and unsurprisingly the upper case of dotted i is dotted I, > > the lower case of dotless I is dotless i. > > Now dotted i and dotless I are in the ASCII range (single UTF-8 byte), > > while dotless i is U+0131, dotted I is U+0130. Both take two bytes. > > > > These are common vowels. > > So you're saying if I do towlower(0x0130) (dotted I) in a Turkish locale > I'll get 0x0069 (ASCII i)?
Yes. Try a recent glibc system with locale tr_TR or tr_TR.utf8. Of course many programs are buggy because their authors at first disregard such details, and then there is a lot of mailing list activity to get things fixed again. Andries (For an example of the type of problems: if someone decides to recognize commands in arbitrary case, and does this by storing them in English upper case and comparing that with toupper(cmd) then things fail in a Turkish locale.) -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
