Re: supporting in the UTF-8 environment on native Windows

Lasse Collin Tue, 24 Dec 2024 13:12:04 -0800

On 2024-12-23 Bruno Haible wrote:
> Lasse Collin reported in
> <https://lists.gnu.org/archive/html/bug-gettext/2024-12/msg00111.html>
> that the setlocale() override from GNU libintl does not support the
> UTF-8 environment of native Windows correctly. That setlocale()
> override is based on the setlocale() override from gnulib. So let me
> add that support here.


Thanks! I looked at the commits but I didn't test anything yet.

(1)
In 9f7ff4f423cd ("localename-unsafe: Support the UTF-8 environment on
native Windows."), the N(name) macro is used with strings that include
@modifier. For example, N("az_AZ@cyrillic") can expand to
"[email protected]". Similarly in 00211fc69c92 ("setlocale: Support
the UTF-8 environment on native Windows."), ".65001" is appended after
the @modifier. However, the typical order would be az_AZ.UTF-8@cyrillic.

I suppose you had a reason to use .65001 instead of .UTF-8 or .utf8.
I expect identical behavior from those. The MS setlocale() docs use
variants of .UTF8:

    
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170#utf-8-support

(2)
In 2f4391fde862 ("setlocale tests: Test in the UTF-8 environment on
native Windows."), the condition

    (strlen (name) > 6 && strcmp (name + strlen (name) - 6, ".UTF-8") == 0)

matches the two long strings below it too, making those two extra
strcmp calls redundant.

(3)
When a manifest is added via a resource file, a possible default
manifest from the toolchain is replaced; they aren't merged. For
example, on MSYS2, the mingw-w64-ucrt-x86_64-gcc package depends on
mingw-w64-ucrt-x86_64-windows-default-manifest. The manifest comes from
Cygwin:

    
https://sourceware.org/git/?p=cygwin-apps/windows-default-manifest.git;a=blob;f=default-manifest.rc

Omitting the <compatibility> section makes the application run with
Vista as the Operating System Context. Omitting the <trustInfo> section
makes Windows treat the application as not UAC compliant, that is, a
pre-Vista app that needs compatibility tricks.

Probably these don't matter with the current tests. I suggest changing
it still because it's still an odd combination to have UTF-8 without
marking the app compatible with recent Windows versions.

(4)
The output from windres goes to a file with the .res suffix but the
format is overridden with --output-format=coff. This looks weird
because windres defaults to --output-format=res for files that use the
.res suffix. For coff, the .o suffix would be logical, and
--output-format option wouldn't be needed.

See the paragraphs near the beginning of the info node
(binutils)windres. A simple command should be enough:

    windres input.rc output.o

> In fact, there are apparently two variants of this mode:
>   - the legacy Windows settings variant: when you haven't ever
>     (or recently?) changed the system default locale of Windows 10,
>   - the modern Windows settings variant: when you have changed
>     the system default locale of Windows 10.
> With the legacy Windows settings, the setlocale() function produces
> locale names such as "English_United States.65001" or
> "English_United States.utf8". With the modern Windows settings, it
> produces "en_US.UTF-8" instead. (This is with both mingw and MSVC,
> according to my testing.)

I don't know enough about Windows to comment much. I only tested on one
Win10 system which returned the long spellings.

If native setlocale(LC_ALL, "") can indeed result in "en_US" or
"en_US.UTF-8", I wonder if it can result in "az-Cyrl_AZ.UTF-8" too. I
don't see how Gnulib or Gettext would map such a locale name to
az_AZ.UTF-8@cyrillic. (az_AZ@cyrillic was the first one with @ in
localename-unsafe.c, thus I looked at that in MS docs too.)

The codeset seems to be a part of the language name:

    
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-lcid/a9eac961-e77d-41a6-90a5-ce1a8b0cdb9c

Locale format doesn't use @modifier:

    
https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-170

-- 
Lasse Collin

Re: supporting in the UTF-8 environment on native Windows

Reply via email to