On Tue, 10 Dec 2002, Maiorana, Jason wrote: > >> If this is not the case, is there any locale which will correctly > >> ctype() all of unicode? > > > > There's NO single 'correct' way although there can be a 'generic' > >isupper, islower, toupper, tolower and so forth work differently > >on the language/region of the locale. > > The unicode standard itself seems to provide standard mappings of > upper, lower, and title case. The locale system does not seem to
Unicode standard does provide the *default*, but that default can be tailored and overridable depending on language/locale/region. That is, what's correct for English may not be correct for Turkish, Irish, Swedish, Dutch, Russian and Bulgarian however minor those differences might be. That's what I meant when I wrote that there is not 'the' correct way. > but I dont see why an "isspace" function couldnt > work correctly for all of unicode/all languages. I also think that *some* categories in LC_CTYPE appear to be language-neutral, but I can't be 100% sure. You never know. > The main point I'm getting at, is that even if I'm in en_US.UTF-8, > why cannot the upper/lower converter make an effort for the > other languages, such as vietnamese, which have obvious case > conversions to any roman-alphabet user. I haven't disputed and won't dispute this point. I totally agree with you on this point. I want en_US.UTF-8 or any ll_CC.UTF-8 to work reasonably well for the full repertoire of Unicode. That's exactly what Unicode is for among other things. However, you cannot assume that what's correct for English as used in US is also correct for French as used in Canada, and other lang/scripts/region combination. > Duplicating the full case conversion tables for all installed > locales does neem a bit redundant... Instead maybe a small file > like: No doubt there should be an efficient way to share what's common across lang/region/scriptsh and store only the 'tailoring delta' separately for each lang/region/script. Well, someone might say that disk is cheap..... Jungshik -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
