RE: mixing LANG and LC_CTYPE

Jungshik Shin Tue, 10 Dec 2002 08:38:27 -0800

On Tue, 10 Dec 2002, Maiorana, Jason wrote:

> >> Should a combination like LANG=fr_FR LC_CTYPE=en_US.UTF-8
> >> result in something equivalent to LANG=fr_FR.UTF-8?

  Even in theory, no if there are differences between French and
English in character classification, 'case conversion' and so forth.
Why don't use just use 'LANG=fr_FR.UTF-8' if that's what you want?

> what about
> LANG=fr_FR.UTF-8
> LC_CTYPE=en_US.UTF-8
> ?

  Nothing wrong with this. All LC_*'s other than LC_CTYPE would
follow LANG, but is that what he want?

> for UTF-8, the ctype information would be the same, right?
> (case, whitespace, etc )

  No, they're language-region dependent.

> If this is not the case, is there any locale which will correctly
> ctype() all of unicode?

  There's NO single 'correct' way although there can be a 'generic' default.
isupper, islower, toupper, tolower and so forth work differently depending
on the language/region of the locale.

> When programming, I avoid the ctype function itself. I think its
> better to convert to utf-8 on input (if its not already) and
> use generic unicode ctype functions.

   The keyword here is 'generic' and is not applicable to all languages
all the time.

  Jungshik

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
RE: mixing LANG and LC_CTYPE

Reply via email to