This has been reported from multiple sources, and we at Perl 5 have
diagnosed the problem
If LC_CTYPE is set to C.UTF-8, it is not possible to set any other
category independently to either C or C.UTF-8 without inadvertently
setting LC_CTYPE back to C. The attached program demonstrates the problem.
Other operating systems don't have this problem because they don't
ignore the third parameter to newlocale(). Perhaps you didn't consider
this scenario when you made the decision to ignore it.
But you can still ignore it, and get proper behavior, I believe. From
looking at execution traces, it appears that there are two underlying
locales supported by OpenBSD: 1 and 2. The locale objects are ints.
Locale 1 is the C locale for all categories
Locale 2 is C.UTF-8 for LC_CTYPE and C for all other categories.
The problem is that calling newlocale() with a non-LC_CTYPE locale and
the name "C" causes it to return 1, regardless of the state of the
LC_CTYPE locale. Then uselocale() is called with 1, and LC_CTYPE gets
changed to C
I believe the following behavior in newlocale() would fix the problem:
If the mask contains LC_CTYPE_MASK, set the locale to 1 or 2 as appropriate.
If the mask doesn't contain LC_CTYPE_MASK, do nothing, and return 1 or 2
depending on LC_CTYPE.
I have a suspicion, without having looked at your code, that you already
do all this except for considering LC_CTYPE when deciding what
newlocale's return value should be.
I don't think you have to save the name of the locale the user used in
the call to newlocale(), unlike setlocale() where you have to
regurgitate it if asked, so I don't know why you suggest that it's best
to call newlocale with a third parameter of zero. That harms portability.
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>
int
main (int argc, char ** argv)
{
/* Setting LC_CTYPE to C.UTF-8 properly makes a UTF-8 locale */
locale_t ctype = newlocale(LC_CTYPE_MASK, "C.UTF-8", (locale_t) 0);
uselocale(ctype);
fprintf(stderr, "MB_CUR_MAX after ctype=%zu\n", MB_CUR_MAX);
/* But then setting LC_NUMERIC destroys the LC_CTYPE setting. Setting a
* particular category should not affect the current settings of any other
* category. On other systems, the third parameter to newlocale() would
* not have been ignored, but on openbsd that argument is ignored. */
locale_t numeric = newlocale(LC_NUMERIC_MASK, "C.UTF-8", ctype);
uselocale(numeric);
fprintf(stderr, "MB_CUR_MAX after numeric=%zu\n", MB_CUR_MAX);
}