On Tue, 10 Dec 2002, Maiorana, Jason wrote:

> >> If this is not the case, is there any locale which will correctly
> >> ctype() all of unicode?
> >
> >  There's NO single 'correct' way although there can be a 'generic'
> >isupper, islower, toupper, tolower and so forth work differently
> >on the language/region of the locale.
>
> The unicode standard itself seems to provide standard mappings of
> upper, lower, and title case. The locale system does not seem to

  Unicode standard does  provide the *default*, but that default can be
tailored and overridable depending on language/locale/region.
That is, what's correct for English may not be correct
for Turkish, Irish, Swedish, Dutch, Russian and Bulgarian
however minor those differences might be. That's what I meant when
I wrote that there is not 'the' correct way.

> but I dont see why an "isspace" function couldnt
> work correctly for all of unicode/all languages.

  I also think that *some* categories in LC_CTYPE
appear to be language-neutral, but I can't be 100% sure. You never know.

> The main point I'm getting at, is that even if I'm in en_US.UTF-8,
> why cannot the upper/lower converter make an effort for the
> other languages, such as vietnamese, which have obvious case
> conversions to any roman-alphabet user.

  I haven't disputed and won't dispute this point. I totally
agree with you on this point. I want en_US.UTF-8 or any ll_CC.UTF-8 to
work reasonably well for the full repertoire of Unicode.  That's exactly
what Unicode is for among other things.  However, you cannot assume that
what's correct for English as used in US is also correct for French as
used in Canada, and other lang/scripts/region combination.

> Duplicating the full case conversion tables for all installed
> locales does neem a bit redundant... Instead maybe a small file
> like:

  No doubt there should be an efficient way to share what's common
across lang/region/scriptsh and store only the 'tailoring delta'
separately for each lang/region/script.  Well, someone might say that
disk is cheap.....

  Jungshik

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to