>> If this is not the case, is there any locale which will correctly >> ctype() all of unicode? > > There's NO single 'correct' way although there can be a 'generic' default. >isupper, islower, toupper, tolower and so forth work differently depending >on the language/region of the locale.
The unicode standard itself seems to provide standard mappings of upper, lower, and title case. The locale system does not seem to have any support for title case, and LC_CTYPE file seems insufficient to describe unicode case conversion (it is an N-char --> N-char mapping now, and can only be done in a string context- not letter be letter as implemented by toupper/tolower) However, the case conversions do seem to be canonical, and not hopelessly language dependant. There are some differences, but I dont see why an "isspace" function couldnt work correctly for all of unicode/all languages. The variations I see are whether or not to enable certain options, such as "German GG-->bisse mappings", and "EastAsian wcwidth" (those could be in locale, i guess... The main point I'm getting at, is that even if I'm in en_US.UTF-8, why cannot the upper/lower converter make an effort for the other languages, such as vietnamese, which have obvious case conversions to any roman-alphabet user. Duplicating the full case conversion tables for all installed locales does neem a bit redundant... Instead maybe a small file like: UnicodeVersion = Latest Normalization-Form = NFC German-bisse = OFF EastAsianWidths = ON etc... -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
