Hi Bruno, This is some interesting information. In the recent months I was working on windows-specific implementation of POSIX newlocale/uselocale functions as well as replacements for locale-related CRT functions.
I have collected some information about locale-related features in Windows API and CRT. I have posted it to mingw-w64 list and you can find it here if you're interested[1]. For example, are you aware that CRTs starting with msvcr80.dll (in particular UCRT) natively support thread locales? I was wondering if libintl/libiconv take this into account, because if not, this may lead to unexpected surprises. - Kirill Makurin [1] https://sourceforge.net/p/mingw-w64/mailman/message/59198335/ ________________________________ From: [email protected] <[email protected]> on behalf of Bruno Haible via Gnulib discussion list <[email protected]> Sent: Wednesday, September 17, 2025 12:45 AM To: [email protected] <[email protected]> Cc: Michele Locati <[email protected]>; Eli Zaretskii <[email protected]> Subject: Document msvcrt (native Windows) bugs regarding console output The stdio output functions have two bugs when it comes to output to a Windows console. Windows consoles come with two encodings: GetACP() and GetOEMCP(). For Japanese, both have the same value (932). However, for English, German, French Windows installations, GETACP() = 1252 and GetOEMCP() = 850. For many years, output of non-ASCII characters to consoles was a PITA: While the program had to produce output in GetACP() encoding when writing to files, it had to produce output in GetOEMCP() encoding when writing to a console. The majority of programs did not do this: they produced output in GetACP() encoding always, and thus non-ASCII characters got garbled in consoles. After many many years, Microsoft finally added a workaround in the C runtime library (msvcrt and ucrt). When a program writes a string to a console, the runtime library tests whether the output goes to a console, and if yes, it does a conversion from GetACP() encoding to GetOEMCP() encoding on the fly, in two steps: from GetACP() to UTF-16 via MultiByteToWideChar, then to GetOEMCP() via WideCharToMultiByte. This workaround works fine in ucrt. But in msvcrt this workaround has two bugs. Both happen when - The output goes to a console. (No bug when the output goes to a file.) and - The stream's mode is _O_TEXT. (Which is the default for stdout and stderr. No bug when the stream's mode is _O_BINARY.) and - setlocale() is called before. (No bug if setlocale() is not called, that is, when the locale remains the "C" locale.) and - The chosen locale has a double-byte encoding, such as CP932. (No bug for unibyte locale encodings, such as CP1252.) and - The console's codepage matches the locale's encoding. For example, after 'chcp 932' was executed. Bug 1: When the application outputs double-byte characters one byte at a time, using the functions fputc() or putc(), the console shows JISX0201 (ASCII and Katakana) characters instead of CP932 (ASCII, Katakana, Hiragana, Hanzi) characters. How to reproduce: 1. Use Windows 10 or 11. Switch it to Japanese as main language. 2. Use the attached program. In the dev environment: $ gcc -Wall foo.c 3. In a cmd.exe console: $ chcp 932 $ .\a Look at the output of the parts C and D. Bug 2: When the application outputs a string, that starts with a non-ASCII character, using the function fwrite(), the console shows no output, and the stream's error indicator gets set. How to reproduce: 1. Use Windows 10 or 11. Switch it to Japanese as main language. 2. Use the attached program. In the dev environment: $ gcc -Wall foo.c 3. In a cmd.exe console: $ chcp 932 $ .\a Look at the output of the parts E and F. I don't plan to add workarounds for these bugs to Gnulib, because * Normal applications don't write strings one byte at a time, for speed. * Normal applications use fwrite() for binary I/O and fputs() or [v][f]printf or similar for text I/O. If anyone wants these bugs fixed, they will have to build their application against ucrt instead of msvcrt. The MSYS2 project contains tools and libraries for mingw+ucrt. (Btw, building with ucrt instead of msvcrt also has the benefit of supporting the UTF-8 locales of Windows. [1][2]) [1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale "Starting in Windows 10 version 1803 (10.0.17134.0), the Universal C Runtime supports using a UTF-8 code page." [2] https://lists.gnu.org/archive/html/bug-gnulib/2024-12/msg00159.html 2025-09-16 Bruno Haible <[email protected]> Document msvcrt (native Windows) bugs regarding console output. * doc/posix-functions/fputc.texi: Document a bug found in msvcrt. * doc/posix-functions/putc.texi: Likewise. * doc/posix-functions/fwrite.texi: Document another bug found in msvcrt.
