I hope that {Get,Set}Console[Output]CP are the only code-page related functions 
I need to take into account that I have not yet tested myself. I guess I'll 
have to deal with them when I get to deal with stdio I/O functions.

- Kirill Makurin
________________________________
From: Pali Rohár <[email protected]>
Sent: Thursday, January 1, 2026 11:00 AM
To: Kirill Makurin <[email protected]>
Cc: mingw-w64-public <[email protected]>
Subject: Re: [Mingw-w64-public] [PATCH 1/3] crt: Improve __mingw_filename_cp() 
to work on systems without AreFileApisANSI() function

Yes, at least this is how I see it.

On Thursday 01 January 2026 01:58:07 Kirill Makurin wrote:
> This ancient stuff makes my head spin.
>
> So, if I follow correctly:
>
> - Before AreFileApisANSI() was added, filename functions used ANSI code page 
> (well, that's good to know)
> - SetFileApisToOEM() was likely added to allow to pass console input (which 
> was OEM) to filename functions without extra OEM-to-ANSI conversion, or 
> vice-versa
> - crtdll.dll/msvcrt10.dll using OEM code pages is just something that was 
> making sense back then (e.g. processing console input)
>
> And we're trying to make sense of it all now :)
>
> - Kirill Makurin
> ________________________________
> From: Pali Rohár <[email protected]>
> Sent: Thursday, January 1, 2026 10:44 AM
> To: Kirill Makurin <[email protected]>
> Cc: mingw-w64-public <[email protected]>
> Subject: Re: [Mingw-w64-public] [PATCH 1/3] crt: Improve 
> __mingw_filename_cp() to work on systems without AreFileApisANSI() function
>
> That crtdll.dll and msvcrt10.dll which use OEM code page in C locale by
> default can be really confusing. But on the other hand, this is fine
> from C point of view (runtime can use whatever wants). There are ways
> how to query code page via CRT functions, so it is fine. Properly
> written C or POSIX application should not expect any specific locale or
> codepage.
>
> Since WinNT3.5 and Win95, FAT stores filename two times, once in old
> shortname 8.3 format which is in OEM code page, and second time in new
> LFN format which is UTF-16 with space of 520 bytes (somehow limited just
> to 255 code words). So changing system OEM code page does not change
> file names (unless they are stored only in shorname 8.3 format).
> The only one exception in FAT which is always stored in shortname format
> is LABEL. So there can be funny things that FAT flash disk can show
> different LABEL identification based on Windows system language.
>
> As CRT codepage is not used for filenames, the remaining possible usage
> is the console input / output. And here by default all Windows versions
> are using OEM codepage for interaction with console. So my guess is that
> older CRT versions are using by default OEM codepage, so console input
> and output can be "correctly" processed without need to use conversion
> function.
>
> As written in that older MS SetFileApisToOEM() documentation, that function
> was there to allow passing result of FindFirstFileA() into the 
> WriteConsoleA().
>
> So switching file apis to OEM, using old crtdll.dll and msvcrt10.dll
> (heh, at the release time it was new CRT) ensured that every char* was
> in OEM codepage and no conversion was needed.
>
> On Thursday 01 January 2026 01:15:10 Kirill Makurin wrote:
> > Thanks for testing. I should have chosen better wording.
> >
> > I was aware that CRT functions which accept filenames depend on 
> > `AreFileApisANSI` and do not use LC_CTYPE's code page (which is very 
> > confusing if you ask me; assuming most people have no idea about 
> > `AreFileApisANSI`), missing only on detail that when CRT locale is set to 
> > UTF-8 it ignores `AreFileApisANSI` and uses UTF-8.
> >
> > With newer CRTs, assuming that AreFileApisANSI() == TRUE by default, in 
> > most cases calling setlocale(LC_ALL, "") will set locale to active ANSI 
> > code page (GetACP); notable exception is pre-UCRT with GetACP() == CP_UTF8. 
> > This way, passing filenames to both Win32 -A and CRT (_open/fopen) 
> > functions will give correct results.
> >
> > With crtdll.dll and msvcrt10.dll, which by default use OEM code pages, they 
> > will disagree in code page used by such functions. "Confusing" would be 
> > more appropriate that "non-sensual".
> >
> > Please correct if I'm wrong, but doesn't FAT use active OEM code page for 
> > filenames? Maybe it has to do with using OEM code pages as default when 
> > setting ancient CRT's locales?
> >
> > - Kirill Makurin
> > ________________________________
> > From: Pali Rohár <[email protected]>
> > Sent: Thursday, January 1, 2026 9:42 AM
> > To: Kirill Makurin <[email protected]>
> > Cc: mingw-w64-public <[email protected]>
> > Subject: Re: [Mingw-w64-public] [PATCH 1/3] crt: Improve 
> > __mingw_filename_cp() to work on systems without AreFileApisANSI() function
> >
> > Well, it is questionable whether it is non-sensual. Just file apis do
> > not use CRT LC_CTYPE codepage on all CRT versions (except when it is set
> > to UTF-8 in UCRT).
> >
> > I tried to do that test on Win95 too, so you would have full image.
> > But the test said that it does not map to any character, which is very
> > suspicious. After debugging the test, I figured out that it failed on
> > CreateFileW call -- which returned NULL, not the INVALID_HANDLE_VALUE (-1).
> >
> > CreateFileW is unsupported on Win9x but does not signal error via value
> > INVALID_HANDLE_VALUE (-1) as documented, but via value NULL (0).
> >
> > NULL handle is also invalid, so maybe we should be more careful when
> > calling winapi functions that they returned handle which is not NULL.
> >
> > So I tried different test. Create 3 new different files via 3 different
> > calls:
> >
> > #include <stdio.h>
> > #include <io.h>
> > #include <fcntl.h>
> > #include <sys/stat.h>
> > #include <locale.h>
> > #include <errno.h>
> > #include <windows.h>
> >
> > int main() {
> >     HANDLE handle;
> >     int fd;
> >
> >     printf("WinAPI ANSI codepage: %u\n", GetACP());
> >     printf("WinAPI OEM codepage: %u\n", GetOEMCP());
> >
> >     if (GetACP() != 1250 || GetOEMCP() != 852) {
> >         printf("This test is specially for ANSI codepage 1250 and OEM 
> > codepage 852\n");
> >         return 1;
> >     }
> >
> >     handle = CreateFileA("C:\\1_\xED.txt", 0, 0, NULL, CREATE_ALWAYS, 
> > FILE_ATTRIBUTE_NORMAL, NULL);
> >     if (handle == NULL || handle == INVALID_HANDLE_VALUE) {
> >         printf("Cannot create file C:\\1_\\xED.txt: %lu\n", GetLastError());
> >         return 1;
> >     }
> >     CloseHandle(handle);
> >
> >     fd = _open("C:\\2_\xED.txt", _O_RDONLY | _O_CREAT, _S_IREAD | 
> > _S_IWRITE);
> >     if (fd < 0) {
> >         printf("Cannot create file C:\\2_\\xED.txt: %lu\n", errno);
> >         return 1;
> >     }
> >     _close(fd);
> >
> >     printf("Calling setlocale\n");
> >     setlocale(LC_ALL, "");
> >
> >     printf("CRT codepage: %s\n", strchr(setlocale(LC_CTYPE, NULL), '.'));
> >
> >     fd = _open("C:\\3_\xED.txt", _O_RDONLY | _O_CREAT, _S_IREAD | 
> > _S_IWRITE);
> >     if (fd < 0) {
> >         printf("Cannot create file C:\\3_\\xED.txt: %lu\n", errno);
> >         return 1;
> >     }
> >     _close(fd);
> >
> >     printf("Done\n");
> >     return 0;
> > }
> >
> > And the result on Win95 is:
> >
> > WinAPI ANSI codepage: 1250
> > WinAPI OEM codepage: 852
> > Calling setlocale
> > CRT codepage: (null)
> > Done
> >
> > All 3 files on FAT filesystem were created as LFN in UTF-16 and contains
> > codepoint U+ED. Hence WinAPI and CRT before and CRT after setlocale used
> > ANSI codepage. (OEM would result in U+DD codepoint)
> >
> > Thanks to UTF-16 LFN support in FAT, it is possible to inspect the real
> > encoding.
> >
> > This behavior seems to be consistent across all Windows versions, and
> > matches what we have in __mingw_filename_cp() function.
> >
> > Hopefully this now answer all your questions regarding the file apis
> > encodings.
> >
> > On Wednesday 31 December 2025 22:13:36 Kirill Makurin wrote:
> > > So, we just discovered another non-sensual Windows behavior... With 
> > > ancient CRTs on ancient systems `setlocale (LC_ALL, "")` will set locale 
> > > to OEM code page, but FileAPIs will use ANSI code pages.
> > >
> > > I still wonder how Win9x system would behave (aren't they DOS-based? I'm 
> > > sorry if I'm mistaken), but I guess there's nearly zero practical value 
> > > in testing them, or is it?
> > >
> > > I actually followed behavior of using OEM code pages by default with 
> > > crtdll.dll and msvcrt10.dll in my library. Now I question whether I 
> > > should use ANSI code pages by default with all CRTs...
> > >
> > > - Kirill Makurin
> > > ________________________________
> > > From: Pali Rohár <[email protected]>
> > > Sent: Thursday, January 1, 2026 4:09 AM
> > > To: Kirill Makurin <[email protected]>
> > > Cc: mingw-w64-public <[email protected]>
> > > Subject: Re: [Mingw-w64-public] [PATCH 1/3] crt: Improve 
> > > __mingw_filename_cp() to work on systems without AreFileApisANSI() 
> > > function
> > >
> > > Ok, you give me an interesting idea, so I decided to test it today.
> > > I prepared the oldest Windows NT 3.1 version which is available here:
> > > https://winworldpc.com/product/windows-nt-3x/31
> > > configured system to ANSI CP 1250 and OEM CP 852 and then compiled and
> > > run following test code against the system crtdll.dll:
> > >
> > > #include <stdio.h>
> > > #include <io.h>
> > > #include <fcntl.h>
> > > #include <locale.h>
> > > #include <windows.h>
> > >
> > > /*
> > >  * UNICODE code point U+ED = LATIN SMALL LETTER I WITH ACUTE (í)
> > >  * UNICODE code point U+DD = LATIN CAPITAL LETTER Y WITH ACUTE (Ý)
> > >  * CP1250 0xED = LATIN SMALL LETTER I WITH ACUTE (í)
> > >  * CP1250 0xDD = LATIN CAPITAL LETTER Y WITH ACUTE (Ý)
> > >  * CP852 0xA1 = LATIN SMALL LETTER I WITH ACUTE (í)
> > >  * CP852 0xED = LATIN CAPITAL LETTER Y WITH ACUTE (Ý)
> > >  */
> > >
> > > int main() {
> > >     HANDLE handle;
> > >     int fd;
> > >
> > >     printf("WinAPI ANSI codepage: %u\n", GetACP());
> > >     printf("WinAPI OEM codepage: %u\n", GetOEMCP());
> > >
> > >     if (GetACP() != 1250 || GetOEMCP() != 852) {
> > >         printf("This test is specially for ANSI codepage 1250 and OEM 
> > > codepage 852\n");
> > >         return 1;
> > >     }
> > >
> > >     handle = CreateFileW(L"C:\\\xED", 0, 0, NULL, CREATE_ALWAYS, 
> > > FILE_ATTRIBUTE_NORMAL, NULL);
> > >     if (handle == INVALID_HANDLE_VALUE) {
> > >         printf("Cannot create file \\xED: %lu\n", GetLastError());
> > >         return 1;
> > >     }
> > >     CloseHandle(handle);
> > >
> > >     handle = CreateFileW(L"C:\\\xED", 0, 0, NULL, OPEN_EXISTING, 
> > > FILE_ATTRIBUTE_NORMAL, NULL);
> > >     if (handle == INVALID_HANDLE_VALUE) {
> > >         printf("Cannot open created file \\xED: %lu\n", GetLastError());
> > >         DeleteFileW(L"C:\\\xED");
> > >         return 1;
> > >     }
> > >     CloseHandle(handle);
> > >
> > >     handle = CreateFileA("C:\\\xDD", 0, 0, NULL, OPEN_EXISTING, 0, NULL);
> > >     printf("WinAPI CreateFileA \\xDD is mapped to UNICODE \\xED: %s\n", 
> > > handle != INVALID_HANDLE_VALUE ? "yes" : "no");
> > >     if (handle != INVALID_HANDLE_VALUE) CloseHandle(handle);
> > >
> > >     handle = CreateFileA("C:\\\xED", 0, 0, NULL, OPEN_EXISTING, 0, NULL);
> > >     printf("WinAPI CreateFileA \\xED is mapped to UNICODE \\xED: %s\n", 
> > > handle != INVALID_HANDLE_VALUE ? "yes" : "no");
> > >     if (handle != INVALID_HANDLE_VALUE) CloseHandle(handle);
> > >
> > >     handle = CreateFileA("C:\\\xA1", 0, 0, NULL, OPEN_EXISTING, 0, NULL);
> > >     printf("WinAPI CreateFileA \\xA1 is mapped to UNICODE \\xED: %s\n", 
> > > handle != INVALID_HANDLE_VALUE ? "yes" : "no");
> > >     if (handle != INVALID_HANDLE_VALUE) CloseHandle(handle);
> > >
> > >     fd = _open("C:\\\xDD", _O_RDONLY);
> > >     printf("CRT _open \\xDD is mapped to UNICODE \\xED: %s\n", fd >= 0 ? 
> > > "yes" : "no");
> > >     if (fd >= 0) _close(fd);
> > >
> > >     fd = _open("C:\\\xED", _O_RDONLY);
> > >     printf("CRT _open \\xED is mapped to UNICODE \\xED: %s\n", fd >= 0 ? 
> > > "yes" : "no");
> > >     if (fd >= 0) _close(fd);
> > >
> > >     fd = _open("C:\\\xA1", _O_RDONLY);
> > >     printf("CRT _open \\xA1 is mapped to UNICODE \\xED: %s\n", fd >= 0 ? 
> > > "yes" : "no");
> > >     if (fd >= 0) _close(fd);
> > >
> > >     printf("Calling setlocale\n");
> > >     setlocale(LC_ALL, "");
> > >
> > >     printf("CRT codepage: %s\n", strchr(setlocale(LC_CTYPE, NULL), '.'));
> > >
> > >     fd = _open("C:\\\xDD", _O_RDONLY);
> > >     printf("CRT _open \\xDD is mapped to UNICODE \\xED: %s\n", fd >= 0 ? 
> > > "yes" : "no");
> > >     if (fd >= 0) _close(fd);
> > >
> > >     fd = _open("C:\\\xED", _O_RDONLY);
> > >     printf("CRT _open \\xED is mapped to UNICODE \\xED: %s\n", fd >= 0 ? 
> > > "yes" : "no");
> > >     if (fd >= 0) _close(fd);
> > >
> > >     fd = _open("C:\\\xA1", _O_RDONLY);
> > >     printf("CRT _open \\xA1 is mapped to UNICODE \\xED: %s\n", fd >= 0 ? 
> > > "yes" : "no");
> > >     if (fd >= 0) _close(fd);
> > >
> > >     DeleteFileW(L"C:\\\xED");
> > >     return 0;
> > > }
> > >
> > >
> > > The output is:
> > >
> > > WinAPI ANSI codepage: 1250
> > > WinAPI OEM codepage: 852
> > > WinAPI CreateFileA \xDD is mapped to UNICODE \xED: no
> > > WinAPI CreateFileA \xED is mapped to UNICODE \xED: yes
> > > WinAPI CreateFileA \xA1 is mapped to UNICODE \xED: no
> > > CRT _open \xDD is mapped to UNICODE \xED: no
> > > CRT _open \xED is mapped to UNICODE \xED: yes
> > > CRT _open \xA1 is mapped to UNICODE \xED: no
> > > Calling setlocale
> > > CRT codepage: .852
> > > CRT _open \xDD is mapped to UNICODE \xED: no
> > > CRT _open \xED is mapped to UNICODE \xED: yes
> > > CRT _open \xA1 is mapped to UNICODE \xED: no
> > >
> > >
> > > So from this test it can be seen that the CRT code page after setlocale
> > > is the OEM one, but both WinAPI -A and CRT _open functions for filenames
> > > are using ANSI codepage.
> > >
> > > So it matches the __mingw_filename_cp() implementation.
> > >
> > > Hence filenames are another exception in CRT which do not follow CRT
> > > codepage, but rather the codepage returned by our helper
> > > __mingw_filename_cp() function.
> > >
> > >
> > > I have found old MS documentation for SetFileApisToOEM() which says:
> > > https://web.archive.org/web/20001206100700/http://msdn.microsoft.com/library/psdk/winbase/filesio_5xv1.htm
> > >
> > > "The 8-bit console functions use the OEM code page by default.
> > > All other functions use the ANSI code page by default."
> > >
> > > I was searching a bit more and I have found another documentation, not
> > > sure what is the source, but says the same thing and contains old note:
> > > http://winapi.freetechsecrets.com/win32/WIN32SetFileApisToOEM.htm
> > >
> > > "On Win32s all file APIs are ANSI."
> > >
> > > So it looks like that for file names are ancient systems using
> > > only ANSI, but for console functions they are using only OEM.
> > >
> > > On Wednesday 31 December 2025 09:53:38 Kirill Makurin wrote:
> > > > I have a funny feeling that ancient systems which do not have 
> > > > AreFileApisANSI may use OEM code page for filenames, but I do not have 
> > > > a way to verify it.
> > > >
> > > > Remember that crtdll.dll and msvcrt10.dll use OEM code pages by 
> > > > default, this makes me think that such ancient system would use OEM 
> > > > code pages in general.
> > > >
> > > > - Kirill Makurin
> > > > ________________________________
> > > > From: Pali Rohár <[email protected]>
> > > > Sent: Wednesday, December 31, 2025 3:29 AM
> > > > To: [email protected] 
> > > > <[email protected]>
> > > > Subject: [Mingw-w64-public] [PATCH 1/3] crt: Improve 
> > > > __mingw_filename_cp() to work on systems without AreFileApisANSI() 
> > > > function
> > > >
> > > > It the AreFileApisANSI() function is not available then fallback to the
> > > > default value that ANSI (ACP) encoding for filenames is used.
> > > > ---
> > > >  mingw-w64-crt/misc/__mingw_filename_cp.c | 17 ++++++++++++++---
> > > >  1 file changed, 14 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/mingw-w64-crt/misc/__mingw_filename_cp.c 
> > > > b/mingw-w64-crt/misc/__mingw_filename_cp.c
> > > > index f6de486e983b..4fc92fcb269f 100644
> > > > --- a/mingw-w64-crt/misc/__mingw_filename_cp.c
> > > > +++ b/mingw-w64-crt/misc/__mingw_filename_cp.c
> > > > @@ -10,9 +10,20 @@
> > > >  #include <windows.h>
> > > >  #include <locale.h>
> > > >
> > > > +/* By default the ANSI (ACP) is used, fallack to default ANSI when 
> > > > function AreFileApisANSI() is not available */
> > > > +static BOOL WINAPI fallbackAreFileApisANSI(VOID) { return TRUE; }
> > > > +
> > > >  unsigned int __cdecl __mingw_filename_cp(void)
> > > >  {
> > > > -  return (___lc_codepage_func() == CP_UTF8)
> > > > -         ? CP_UTF8
> > > > -         : AreFileApisANSI() ? CP_ACP : CP_OEMCP;
> > > > +  if (___lc_codepage_func() == CP_UTF8)
> > > > +    return CP_UTF8;
> > > > +
> > > > +  /* Function AreFileApisANSI() is not available in older Windows 
> > > > versions, so resolve it at runtime */
> > > > +  static BOOL (WINAPI *myAreFileApisANSI)(VOID) = NULL;
> > > > +  if (!myAreFileApisANSI) {
> > > > +    HMODULE kernel32 = GetModuleHandleA("kernel32.dll");
> > > > +    FARPROC farproc = kernel32 ? GetProcAddress(kernel32, 
> > > > "AreFileApisANSI") : NULL;
> > > > +    (void)InterlockedExchangePointer((PVOID*)&myAreFileApisANSI, 
> > > > farproc ?: fallbackAreFileApisANSI);
> > > > +  }
> > > > +  return myAreFileApisANSI() ? CP_ACP : CP_OEMCP;
> > > >  }
> > > > --
> > > > 2.20.1
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Mingw-w64-public mailing list
> > > > [email protected]
> > > > https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to