On 2024-12-23 Bruno Haible wrote:
> Lasse Collin reported in
> <https://lists.gnu.org/archive/html/bug-gettext/2024-12/msg00111.html>
> that the setlocale() override from GNU libintl does not support the
> UTF-8 environment of native Windows correctly. That setlocale()
> override is based on the setlocale() override from gnulib. So let me
> add that support here.
Thanks! I looked at the commits but I didn't test anything yet.
(1)
In 9f7ff4f423cd ("localename-unsafe: Support the UTF-8 environment on
native Windows."), the N(name) macro is used with strings that include
@modifier. For example, N("az_AZ@cyrillic") can expand to
"[email protected]". Similarly in 00211fc69c92 ("setlocale: Support
the UTF-8 environment on native Windows."), ".65001" is appended after
the @modifier. However, the typical order would be az_AZ.UTF-8@cyrillic.
I suppose you had a reason to use .65001 instead of .UTF-8 or .utf8.
I expect identical behavior from those. The MS setlocale() docs use
variants of .UTF8:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170#utf-8-support
(2)
In 2f4391fde862 ("setlocale tests: Test in the UTF-8 environment on
native Windows."), the condition
(strlen (name) > 6 && strcmp (name + strlen (name) - 6, ".UTF-8") == 0)
matches the two long strings below it too, making those two extra
strcmp calls redundant.
(3)
When a manifest is added via a resource file, a possible default
manifest from the toolchain is replaced; they aren't merged. For
example, on MSYS2, the mingw-w64-ucrt-x86_64-gcc package depends on
mingw-w64-ucrt-x86_64-windows-default-manifest. The manifest comes from
Cygwin:
https://sourceware.org/git/?p=cygwin-apps/windows-default-manifest.git;a=blob;f=default-manifest.rc
Omitting the <compatibility> section makes the application run with
Vista as the Operating System Context. Omitting the <trustInfo> section
makes Windows treat the application as not UAC compliant, that is, a
pre-Vista app that needs compatibility tricks.
Probably these don't matter with the current tests. I suggest changing
it still because it's still an odd combination to have UTF-8 without
marking the app compatible with recent Windows versions.
(4)
The output from windres goes to a file with the .res suffix but the
format is overridden with --output-format=coff. This looks weird
because windres defaults to --output-format=res for files that use the
.res suffix. For coff, the .o suffix would be logical, and
--output-format option wouldn't be needed.
See the paragraphs near the beginning of the info node
(binutils)windres. A simple command should be enough:
windres input.rc output.o
> In fact, there are apparently two variants of this mode:
> - the legacy Windows settings variant: when you haven't ever
> (or recently?) changed the system default locale of Windows 10,
> - the modern Windows settings variant: when you have changed
> the system default locale of Windows 10.
> With the legacy Windows settings, the setlocale() function produces
> locale names such as "English_United States.65001" or
> "English_United States.utf8". With the modern Windows settings, it
> produces "en_US.UTF-8" instead. (This is with both mingw and MSVC,
> according to my testing.)
I don't know enough about Windows to comment much. I only tested on one
Win10 system which returned the long spellings.
If native setlocale(LC_ALL, "") can indeed result in "en_US" or
"en_US.UTF-8", I wonder if it can result in "az-Cyrl_AZ.UTF-8" too. I
don't see how Gnulib or Gettext would map such a locale name to
az_AZ.UTF-8@cyrillic. (az_AZ@cyrillic was the first one with @ in
localename-unsafe.c, thus I looked at that in MS docs too.)
The codeset seems to be a part of the language name:
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-lcid/a9eac961-e77d-41a6-90a5-ce1a8b0cdb9c
Locale format doesn't use @modifier:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-170
--
Lasse Collin