Package: src:glibc
Version: 2.24-17
Severity: wishlist
Tags: patch

Hi!
Here's a simple patch set to change the default of setlocale(…, "") to
C.UTF-8.  This is a drastically smaller change than altering the meaning of
"C" to mean "C.UTF-8" that upstream is mulling over -- it affects only
programs that already have locale support, when the user fails to set any.

If none of LC_ALL, LANG nor LC_CTYPE are set, instead of taking this to mean
"C" we assume "C.UTF-8".  This is explicitely allowed by POSIX (an
"implementation-defined default locale").  setlocale(…, "C") or not calling
it at all retain the old meaning[1].

This is the approach already taken by musl.

I'm not submitting this upstream first as C.UTF-8 is still a Debian-specific
thing.

The improvement would be: if for any reason the user fails to set the
locale, a daemon's startup script is too eager clearing its environment,
a build chroot fails to inherit env vars, etc -- in all of these cases we'll
fall back to an UTF-8 locale.  Making a locale-aware program use "C" is
still fully possible via setting LC_ALL=C but we won't suffer from non-UTF8
by omission.


This is mostly an one-line patch (1/3), the other two update the testsuite
(2/3) and alter hard-coded output of /usr/bin/locale (3/3).


Meow!

[1]. Making "C" behave like "C.UTF-8" would be, according to my reading,
compliant with both POSIX-2008@2016 and C11 except for a minor iswblank()
weirdness, but this is not a part of this change.
-- System Information:
Debian Release: buster/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'unstable'), (500, 'testing'), 
(150, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.13.0-rc7-debug-ubsan-00220-g92222baeac7d (SMP w/6 CPU cores)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE=C.UTF-8 
(charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)
>From 92d9938c6ba813afaf854d7bc12a9dc0c71371c3 Mon Sep 17 00:00:00 2001
From: Adam Borowski <kilob...@angband.pl>
Date: Sun, 3 Sep 2017 00:26:47 +0200
Subject: [PATCH 1/3] Default to C.UTF-8 on setlocale(..., "") if no env vars
 are set.

This doesn't affects programs that are not prepared to handle arbitrary
locales as those either don't call setlocale() at all or use setlocale(...,
"C"); merely programs which would have used a proper locale had the user
set it up.

This provides a decent default when env var configuration is missing, in a
way that's more robust than mucking with login defs and daemon startup
scripts.

A default locale other than "C" is allowed by POSIX; also at least musl
uses an equivalent of C.UTF-8 already.
---
 locale/findlocale.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/locale/findlocale.c b/locale/findlocale.c
index 4cb9d5ea8a..2a12b4e808 100644
--- a/locale/findlocale.c
+++ b/locale/findlocale.c
@@ -123,8 +123,12 @@ _nl_find_locale (const char *locale_path, size_t 
locale_path_len,
                            + _nl_category_name_idxs[category]);
       if (!name_present (cloc_name))
        cloc_name = getenv ("LANG");
+      /* If no env vars are set, we're free to choose an
+         "implementation-defined default locale":
+         
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
+      */
       if (!name_present (cloc_name))
-       cloc_name = _nl_C_name;
+       cloc_name = "C.UTF-8";
     }
 
   /* We used to fall back to the C locale if the name contains a slash
-- 
2.14.1

>From 612dc7f67f93882b7acb2f035b1cc200ceb2e153 Mon Sep 17 00:00:00 2001
From: Adam Borowski <kilob...@angband.pl>
Date: Sun, 3 Sep 2017 03:43:10 +0200
Subject: [PATCH 2/3] Adjust the setlocale test suite for C.UTF-8 as default.

---
 localedata/bug-setlocale1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/localedata/bug-setlocale1.c b/localedata/bug-setlocale1.c
index 546ea7beb8..2c86e2361d 100644
--- a/localedata/bug-setlocale1.c
+++ b/localedata/bug-setlocale1.c
@@ -39,9 +39,9 @@ do_test (void)
   if (d == NULL)
     return 1;
 
-  if (strcmp (d, "C") != 0)
+  if (strcmp (d, "C.UTF-8") != 0)
     {
-      puts ("*** LC_NUMERIC not C");
+      puts ("*** LC_NUMERIC not C.UTF-8");
       result = 1;
     }
 
-- 
2.14.1

>From fb6cc4a418c6278dfc2dcf45bc1ea40e06ef9caf Mon Sep 17 00:00:00 2001
From: Adam Borowski <kilob...@angband.pl>
Date: Sun, 3 Sep 2017 13:43:41 +0200
Subject: [PATCH 3/3] Change hard-coded value for "no defined vars" in
 /usr/bin/locale.

---
 locale/programs/locale.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/locale/programs/locale.c b/locale/programs/locale.c
index 9da3e5319f..131472766c 100644
--- a/locale/programs/locale.c
+++ b/locale/programs/locale.c
@@ -819,7 +819,7 @@ show_locale_vars (void)
          print_assignment (name,
                            lcall[0] != '\0' ? lcall
                            : lang[0] != '\0' ? lang
-                           : "POSIX",
+                           : "C.UTF-8",
                            true);
        else
          print_assignment (name, val, false);
-- 
2.14.1

Reply via email to