On Tue, Dec 10, 2002 at 01:43:44PM -0500, Jungshik Shin wrote: > > On Tue, 10 Dec 2002, Maiorana, Jason wrote: > > > >> If this is not the case, is there any locale which will correctly > > >> ctype() all of unicode? > > > > > > There's NO single 'correct' way although there can be a 'generic' > > >isupper, islower, toupper, tolower and so forth work differently > > >on the language/region of the locale. > > > > The unicode standard itself seems to provide standard mappings of > > upper, lower, and title case. The locale system does not seem to > > Unicode standard does provide the *default*, but that default can be > tailored and overridable depending on language/locale/region. > That is, what's correct for English may not be correct > for Turkish, Irish, Swedish, Dutch, Russian and Bulgarian > however minor those differences might be. That's what I meant when > I wrote that there is not 'the' correct way.
Also 14652 provides a default values for upper and lower. The default values in 14652 is actually taken from an earlier version of the locales in glibc. I am not sure, but I think this is what glibc then still uses, and thus not unicode tables. > > but I dont see why an "isspace" function couldnt > > work correctly for all of unicode/all languages. > > I also think that *some* categories in LC_CTYPE > appear to be language-neutral, but I can't be 100% sure. You never know. I would think many of the LC_CTYPE categories are language neutral. > > > Duplicating the full case conversion tables for all installed > > locales does neem a bit redundant... Instead maybe a small file > > like: > > No doubt there should be an efficient way to share what's common > across lang/region/scriptsh and store only the 'tailoring delta' > separately for each lang/region/script. Well, someone might say that > disk is cheap..... 14652 has tailoring for sorting, but not for LC_CTYPE. We are looking at revising the just approved 14652, so proposals are welcome. Kind regards keld -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
