Ingo Schwarze wrote on Wed, Mar 28, 2018 at 03:23:30PM +0200:
> Karl Williamson wrote on Mon, Mar 19, 2018 at 11:51:31AM -0600:
>> I still believe that in my program the setlocale() returning C for
>> LC_ALL is a bug. LC_CTYPE should have successfully been set to Romanian
>> UTF-8, and so LC_ALL isn't C. Instead, it is a combination of C for all
>> the other categories, and UTF-8 for LC_CTYPE. A return of just "C"
>> doesn't reflect that complexity. There is a footnote in the ANSI/ISO
>> 9899-1990 C standard that the returned string must support that
>> heterogeneity, and that the return value be able to be used in a future
>> setlocale to get back to the original state. Your setlocale violates
>> the standard therefore, and harms your portability goal.
> So, here is a patch to fix that, together with two other minor bugs
> that i found while looking at my code, and two minor style cleanups.
Oops, i caused a regression here. changegl() still sets the LC_ALL
entry to "" rather than to NULL if the environment variable LC_ALL
is not defined, but the rest of the code would now expect NULL.
The simplest fix is to continue using "" everywhere to "mean LC_ALL
is not set". See below for the fixed patch.
> OK?
> Ingo
>
> Rationale:
> * chunk 39, first part (style):
> There is no need for calloc(3) because all _LC_LAST entries
> in newgl will be assigned to right away.
> * chunk 39, second part (main bugfix):
> The problem:
> In the sequence
> setlocale(LC_ALL, "A");
> setlocale(LC_CTYPE, "T");
> setlocale(LC_ALL, NULL);
> the last call returns "A" but ought to return "A/T/A/A/A/A".
> The root cause:
> setlocale(LC_ALL, "A") sets the LC_ALL entry in global_locale,
> but setlocale(LC_CTYPE, "T") neglects to clear it.
> The solution:
> dupgl() must never copy the LC_ALL entry.
> When we are setting a category other than LC_ALL, or when we
> are setting all categories with the LC_ALL, "/////" syntax,
> the old LC_ALL entry becomes invalid. When we are setting
> LC_ALL uniformly, it will be replaced anyway.
> * chunk 92 (first additonal minor bug):
> If strdup(locname) fails, newgl is leaked.
> * chunk 136 (style):
> Resolve code duplication.
> * chunk 184 (second additional minor bug):
> Consider the nonsensical
> setlocale(LC_ALL, "C/C/fr_FR.UTF-8/C/C/C");
> Of course, it makes no sense to request UTF-8 messages
> in ASCII encoding. But *if* that is requested, the code finds
> the dot in global_locname and switches the _GlobalRuneLocale
> to UTF-8, which is wrong, as "C" was requested for LC_CTYPE.
> Fix this by looking at the LC_CTYPE entry only when deciding
> what to use for _GlobalRuneLocale.
> There is no risk associated with this fix because we ignore
> LC_MESSAGES etc. anyway, so we know we wont actually try to
> print UTF-8 into an ASCII output stream.
Index: lib/libc/locale/setlocale.c
===================================================================
RCS file: /cvs/src/lib/libc/locale/setlocale.c,v
retrieving revision 1.27
diff -u -p -r1.27 setlocale.c
--- lib/libc/locale/setlocale.c 5 Sep 2017 03:16:13 -0000 1.27
+++ lib/libc/locale/setlocale.c 28 Mar 2018 14:28:20 -0000
@@ -39,11 +39,11 @@ dupgl(char **oldgl)
char **newgl;
int ic;
- if ((newgl = calloc(_LC_LAST, sizeof(*newgl))) == NULL)
+ if ((newgl = reallocarray(NULL, _LC_LAST, sizeof(*newgl))) == NULL)
return NULL;
for (ic = LC_ALL; ic < _LC_LAST; ic++) {
- if ((newgl[ic] = strdup(oldgl != NULL ?
- oldgl[ic] : ic == LC_ALL ? "" : "C")) == NULL) {
+ if ((newgl[ic] = strdup(ic == LC_ALL ? "" :
+ oldgl == NULL ? "C" : oldgl[ic])) == NULL) {
freegl(newgl);
return NULL;
}
@@ -92,8 +92,10 @@ setlocale(int category, const char *locn
if (category == LC_ALL && strchr(locname, '/') != NULL) {
/* One value for each category. */
- if ((firstname = strdup(locname)) == NULL)
+ if ((firstname = strdup(locname)) == NULL) {
+ freegl(newgl);
return NULL;
+ }
nextname = firstname;
for (ic = 1; ic < _LC_LAST; ic++)
if (nextname == NULL || changegl(ic,
@@ -136,22 +138,14 @@ setlocale(int category, const char *locn
goto done;
}
- /* Individual category. */
- if (category > LC_ALL) {
+ /* Individual category, or LC_ALL uniformly set. */
+ if (category > LC_ALL || newgl[LC_ALL][0] != '\0') {
if (strlcpy(global_locname, newgl[category],
sizeof(global_locname)) >= sizeof(global_locname))
global_locname[0] = '\0';
goto done;
}
- /* LC_ALL overrides everything else. */
- if (newgl[LC_ALL][0] != '\0') {
- if (strlcpy(global_locname, newgl[LC_ALL],
- sizeof(global_locname)) >= sizeof(global_locname))
- global_locname[0] = '\0';
- goto done;
- }
-
/*
* Check whether all categories agree and return either
* the single common name for all categories or a string
@@ -184,7 +178,7 @@ done:
global_locale = newgl;
if (category == LC_ALL || category == LC_CTYPE)
_GlobalRuneLocale =
- strchr(global_locname, '.') == NULL ?
+ strchr(newgl[LC_CTYPE], '.') == NULL ?
&_DefaultRuneLocale : _Utf8RuneLocale;
} else {
freegl(newgl);
Index: regress/lib/libc/locale/setlocale/setlocale.c
===================================================================
RCS file: /cvs/src/regress/lib/libc/locale/setlocale/setlocale.c,v
retrieving revision 1.3
diff -u -p -r1.3 setlocale.c
--- regress/lib/libc/locale/setlocale/setlocale.c 25 Feb 2017 07:28:32
-0000 1.3
+++ regress/lib/libc/locale/setlocale/setlocale.c 28 Mar 2018 14:28:20
-0000
@@ -75,7 +75,6 @@ main(int argc, char *argv[])
/* load from env */
/* NOTE: we don't support non-C locales for some categories */
- /*test_setlocale("fr_FR.UTF-8", LC_ALL, "");*/ /* set */
test_setlocale("fr_FR.UTF-8", LC_CTYPE, ""); /* set */
test_setlocale("fr_FR.UTF-8", LC_MESSAGES, ""); /* set */
test_MB_CUR_MAX(4);
@@ -113,6 +112,7 @@ main(int argc, char *argv[])
test_setlocale("C", LC_ALL, "C"); /* reset */
test_setlocale("invalid.UTF-8", LC_CTYPE, "invalid.UTF-8"); /* set */
test_setlocale("invalid.UTF-8", LC_CTYPE, NULL);
+ test_setlocale("C/invalid.UTF-8/C/C/C/C", LC_ALL, NULL);
test_MB_CUR_MAX(4);
/* with invalid codeset (is an error) */