RE: mixing LANG and LC_CTYPE

Maiorana, Jason Tue, 10 Dec 2002 08:57:01 -0800

>> If this is not the case, is there any locale which will correctly
>> ctype() all of unicode?
>
>  There's NO single 'correct' way although there can be a 'generic'
default.
>isupper, islower, toupper, tolower and so forth work differently
depending
>on the language/region of the locale.


The unicode standard itself seems to provide standard mappings of
upper, lower, and title case. The locale system does not seem to
have any support for title case, and LC_CTYPE file seems insufficient
to describe unicode case conversion (it is an N-char --> N-char
mapping now, and can only be done in a string context- not
letter be letter as implemented by toupper/tolower)

However, the case conversions do seem to be canonical, and not
hopelessly language dependant. There are some differences,
but I dont see why an "isspace" function couldnt
work correctly for all of unicode/all languages.

The variations I see are whether or not to enable certain
options, such as "German GG-->bisse mappings", and
"EastAsian wcwidth" (those could be in locale, i guess...

The main point I'm getting at, is that even if I'm in en_US.UTF-8,
why cannot the upper/lower converter make an effort for the
other languages, such as vietnamese, which have obvious case
conversions to any roman-alphabet user.

Duplicating the full case conversion tables for all installed
locales does neem a bit redundant... Instead maybe a small file
like:

UnicodeVersion = Latest
Normalization-Form = NFC
German-bisse = OFF
EastAsianWidths = ON
etc...

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

RE: mixing LANG and LC_CTYPE

Reply via email to