Re: POSIX character classes

Thorsten Glaser Fri, 24 Mar 2017 14:42:21 -0700

Martijn Dekker dixit:

>> Can I get by making them match ASCII only even in UTF-8 mode?
>
>IMHO, that would defeat their primary purpose, namely locale-dependent
>class matching, so no, not really. :)
>
>If Greeks or Russians (or Germans, for that matter) can't count on
>[:upper:] matching an upper case letter in their alphabets, then I'd say


There’s no alphabets in UTF-8, only global Unicode.

>> Strictly speaking, POSIX requires only support for the C locale,
>[...]
>
>Yes, but on systems supporting other locales (e.g. UTF-8), it would not
>be conforming for character classes to match ASCII only. You either
>support UTF-8 or you don't.

For POSIX purposes, we really don’t, as we use our own routines
to read and write multibyte characters and handle them as wide
characters internally. We _really_ cannot use POSIX locales in
mksh at all. So if a system has 32-bit wchar_t and supports the
Unicode astral planes, mksh isn’t conforming in UTF-8 mode there
either. (POSIX does, however, not demand UTF-8 or Unicode support
at all, only the C locale, so that’s okay.)


The question was more whether [[:upper:]] matching [A-Z] would
be more useful than not matching anything at all.

bye,
//mirabilos
-- 
“It is inappropriate to require that a time represented as
 seconds since the Epoch precisely represent the number of
 seconds between the referenced time and the Epoch.”
        -- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2

Re: POSIX character classes

Reply via email to