This has been reported from multiple sources, and we at Perl 5 have diagnosed the problem

If LC_CTYPE is set to C.UTF-8, it is not possible to set any other category independently to either C or C.UTF-8 without inadvertently setting LC_CTYPE back to C. The attached program demonstrates the problem.

Other operating systems don't have this problem because they don't ignore the third parameter to newlocale(). Perhaps you didn't consider this scenario when you made the decision to ignore it.

But you can still ignore it, and get proper behavior, I believe. From looking at execution traces, it appears that there are two underlying locales supported by OpenBSD: 1 and 2. The locale objects are ints.

Locale 1 is the C locale for all categories
Locale 2 is C.UTF-8 for LC_CTYPE and C for all other categories.

The problem is that calling newlocale() with a non-LC_CTYPE locale and the name "C" causes it to return 1, regardless of the state of the LC_CTYPE locale. Then uselocale() is called with 1, and LC_CTYPE gets changed to C

I believe the following behavior in newlocale() would fix the problem:

If the mask contains LC_CTYPE_MASK, set the locale to 1 or 2 as appropriate.

If the mask doesn't contain LC_CTYPE_MASK, do nothing, and return 1 or 2 depending on LC_CTYPE.

I have a suspicion, without having looked at your code, that you already do all this except for considering LC_CTYPE when deciding what newlocale's return value should be.

I don't think you have to save the name of the locale the user used in the call to newlocale(), unlike setlocale() where you have to regurgitate it if asked, so I don't know why you suggest that it's best to call newlocale with a third parameter of zero. That harms portability.

#include <stdlib.h>
#include <stdio.h>
#include <locale.h>

int
main (int argc, char ** argv)
{
    /* Setting LC_CTYPE to C.UTF-8 properly makes a UTF-8 locale */
    locale_t ctype = newlocale(LC_CTYPE_MASK, "C.UTF-8", (locale_t) 0);
    uselocale(ctype);
    fprintf(stderr, "MB_CUR_MAX after ctype=%zu\n", MB_CUR_MAX);

    /* But then setting LC_NUMERIC destroys the LC_CTYPE setting.  Setting a
     * particular category should not affect the current settings of any other
     * category.  On other systems, the third parameter to newlocale() would
     * not have been ignored, but on openbsd that argument is ignored. */
    locale_t numeric = newlocale(LC_NUMERIC_MASK, "C.UTF-8", ctype);
    uselocale(numeric);
    fprintf(stderr, "MB_CUR_MAX after numeric=%zu\n", MB_CUR_MAX);
}

Reply via email to