Package: src:glibc Version: 2.24-17 Severity: wishlist Tags: patch Hi! Here's a simple patch set to change the default of setlocale(…, "") to C.UTF-8. This is a drastically smaller change than altering the meaning of "C" to mean "C.UTF-8" that upstream is mulling over -- it affects only programs that already have locale support, when the user fails to set any.
If none of LC_ALL, LANG nor LC_CTYPE are set, instead of taking this to mean "C" we assume "C.UTF-8". This is explicitely allowed by POSIX (an "implementation-defined default locale"). setlocale(…, "C") or not calling it at all retain the old meaning[1]. This is the approach already taken by musl. I'm not submitting this upstream first as C.UTF-8 is still a Debian-specific thing. The improvement would be: if for any reason the user fails to set the locale, a daemon's startup script is too eager clearing its environment, a build chroot fails to inherit env vars, etc -- in all of these cases we'll fall back to an UTF-8 locale. Making a locale-aware program use "C" is still fully possible via setting LC_ALL=C but we won't suffer from non-UTF8 by omission. This is mostly an one-line patch (1/3), the other two update the testsuite (2/3) and alter hard-coded output of /usr/bin/locale (3/3). Meow! [1]. Making "C" behave like "C.UTF-8" would be, according to my reading, compliant with both POSIX-2008@2016 and C11 except for a minor iswblank() weirdness, but this is not a part of this change. -- System Information: Debian Release: buster/sid APT prefers unstable-debug APT policy: (500, 'unstable-debug'), (500, 'unstable'), (500, 'testing'), (150, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.13.0-rc7-debug-ubsan-00220-g92222baeac7d (SMP w/6 CPU cores) Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE=C.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: sysvinit (via /sbin/init)
>From 92d9938c6ba813afaf854d7bc12a9dc0c71371c3 Mon Sep 17 00:00:00 2001 From: Adam Borowski <kilob...@angband.pl> Date: Sun, 3 Sep 2017 00:26:47 +0200 Subject: [PATCH 1/3] Default to C.UTF-8 on setlocale(..., "") if no env vars are set. This doesn't affects programs that are not prepared to handle arbitrary locales as those either don't call setlocale() at all or use setlocale(..., "C"); merely programs which would have used a proper locale had the user set it up. This provides a decent default when env var configuration is missing, in a way that's more robust than mucking with login defs and daemon startup scripts. A default locale other than "C" is allowed by POSIX; also at least musl uses an equivalent of C.UTF-8 already. --- locale/findlocale.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/locale/findlocale.c b/locale/findlocale.c index 4cb9d5ea8a..2a12b4e808 100644 --- a/locale/findlocale.c +++ b/locale/findlocale.c @@ -123,8 +123,12 @@ _nl_find_locale (const char *locale_path, size_t locale_path_len, + _nl_category_name_idxs[category]); if (!name_present (cloc_name)) cloc_name = getenv ("LANG"); + /* If no env vars are set, we're free to choose an + "implementation-defined default locale": + http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02 + */ if (!name_present (cloc_name)) - cloc_name = _nl_C_name; + cloc_name = "C.UTF-8"; } /* We used to fall back to the C locale if the name contains a slash -- 2.14.1
>From 612dc7f67f93882b7acb2f035b1cc200ceb2e153 Mon Sep 17 00:00:00 2001 From: Adam Borowski <kilob...@angband.pl> Date: Sun, 3 Sep 2017 03:43:10 +0200 Subject: [PATCH 2/3] Adjust the setlocale test suite for C.UTF-8 as default. --- localedata/bug-setlocale1.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/localedata/bug-setlocale1.c b/localedata/bug-setlocale1.c index 546ea7beb8..2c86e2361d 100644 --- a/localedata/bug-setlocale1.c +++ b/localedata/bug-setlocale1.c @@ -39,9 +39,9 @@ do_test (void) if (d == NULL) return 1; - if (strcmp (d, "C") != 0) + if (strcmp (d, "C.UTF-8") != 0) { - puts ("*** LC_NUMERIC not C"); + puts ("*** LC_NUMERIC not C.UTF-8"); result = 1; } -- 2.14.1
>From fb6cc4a418c6278dfc2dcf45bc1ea40e06ef9caf Mon Sep 17 00:00:00 2001 From: Adam Borowski <kilob...@angband.pl> Date: Sun, 3 Sep 2017 13:43:41 +0200 Subject: [PATCH 3/3] Change hard-coded value for "no defined vars" in /usr/bin/locale. --- locale/programs/locale.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/locale/programs/locale.c b/locale/programs/locale.c index 9da3e5319f..131472766c 100644 --- a/locale/programs/locale.c +++ b/locale/programs/locale.c @@ -819,7 +819,7 @@ show_locale_vars (void) print_assignment (name, lcall[0] != '\0' ? lcall : lang[0] != '\0' ? lang - : "POSIX", + : "C.UTF-8", true); else print_assignment (name, val, false); -- 2.14.1