Martijn Dekker dixit: >> Can I get by making them match ASCII only even in UTF-8 mode? > >IMHO, that would defeat their primary purpose, namely locale-dependent >class matching, so no, not really. :) > >If Greeks or Russians (or Germans, for that matter) can't count on >[:upper:] matching an upper case letter in their alphabets, then I'd say
There’s no alphabets in UTF-8, only global Unicode. >> Strictly speaking, POSIX requires only support for the C locale, >[...] > >Yes, but on systems supporting other locales (e.g. UTF-8), it would not >be conforming for character classes to match ASCII only. You either >support UTF-8 or you don't. For POSIX purposes, we really don’t, as we use our own routines to read and write multibyte characters and handle them as wide characters internally. We _really_ cannot use POSIX locales in mksh at all. So if a system has 32-bit wchar_t and supports the Unicode astral planes, mksh isn’t conforming in UTF-8 mode there either. (POSIX does, however, not demand UTF-8 or Unicode support at all, only the C locale, so that’s okay.) The question was more whether [[:upper:]] matching [A-Z] would be more useful than not matching anything at all. bye, //mirabilos -- “It is inappropriate to require that a time represented as seconds since the Epoch precisely represent the number of seconds between the referenced time and the Epoch.” -- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2