There are plans for this, having a POSIX.UTF-8 locale as an XSI base requirement. There may be POSIX.UTF-E and UTF-I locales too; same features, simply the different charmaps. As options there may even be, albeit this is unlikely as no platform I'm aware of fully supports ISO-6429 now, a POSIX.ISO-7 and POSIX.ISO-8 specification as well. Because c11 and c17 are fundamentally broken, with only a minimal partial fix slated for c2x, there are no viable plans for a C.UTF-8 or C.UTF-E proposal that I've ever seen.
However, the way the standard is written now only the repertoire that transforms to a single byte encoding may be used, and is what the c2x fix limits itself to. This is effectively normative support only of ASCII-68, not ISO-646 or 10646. Expanding support to include some of the 2 byte graphic repertoire is already permitted by POSIX, but not required. Making allowances for most of the UCS2 repertoire is fairly easy, including its 3 byte UTF-8 representations, but the text for this, and the significant changes for the 4 byte form needed for full UCS-4 and UTF-16 support, is still to be proposed. The point is it is still too early, in my opinion, to say what additional capabilities these locales will provide to applications to ease multi-lingual portability. Of the four choices I see the second or third as the minimum desireable. The industry as a whole needs to communicate how much of Unicode they want to be supported in Issue 8 or they will be stuck with the minimal represented by ASCII-68. Whatever is decided upon, bug fixes and breaking changes to non-portable aspects of existing implementations to be conforming to the final formal specification of the locale are to be expected. On Thursday, June 25, 2020 Ingo Schwarze <schwa...@usta.de> wrote: Hi Alan, Alan Coopersmith wrote on Thu, Jun 25, 2020 at 07:59:39AM -0700: > On 6/25/20 6:33 AM, Hans Aberg wrote: >> Perhaps there should be a default UTF-8 locale: It seems that the >> current construct does not apply so well to it. > If the goal is to standardize existing behavior the standard could define > the C.UTF-8 locale (or perhaps a POSIX.UTF-8 locale) that a number of > systems already have, which is the standard C/POSIX locale with just the > character set changed to UTF-8 instead. This idea makes a lot of sense to me. If the Austin Group decides that it wants to go into that direction, i would make sure that both OpenBSD and the software i publish use that name for a locale with these properties and consistently recommend using that name. Both already support a locale with these properties and select it if the user asks for C.UTF-8 or POSIX.UTF-8, but so far, they recommend that users specify en_US.UTF-8 (for historical reasons), which is a bit unfortunate because it looks like requesting cultural conventions for a particular country, which is not the intention. Whether to standardize only C.UTF-8 or both C.UTF-8 and POSIX.UTF-8 as synonyms looks a bit like asking for the best colour of a bikeshed. Given that the standard already contains the redundancy of requiring both "C" and "POSIX", maybe it is more consistent to also require both "C.UTF-8" and "POSIX.UTF-8", but i don't think that matters greatly. Yours, Ingo