I have updated my initial patch, if you're OK with it.

There is second patch which should add emulation of btowc and wctob for 
msvcrt.dll. I have not tested it, sorry for that.

- Kirill Makurin
________________________________
From: Kirill Makurin <maiddais...@outlook.com>
Sent: Friday, June 6, 2025 6:47 PM
To: lh_mo...@126.com <lh_mo...@126.com>; mingw-w64-public 
<mingw-w64-public@lists.sourceforge.net>
Subject: Re: [Mingw-w64-public] Inconsistent behavior of btowc with "C" locale

I can update patch I initially attached to:

For btowc, check for value to be in range [0,255] instead of [0,127].

For wctob, remove range check and use code page 20127 (ASCII) for conversion 
(so it matches CRT's best-fit behavior).

- Kirill Makurin


________________________________
From: LIU Hao
Sent: Thursday, June 5, 2025 1:04 PM
To: Kirill Makurin; mingw-w64-public@lists.sourceforge.net
Subject: Re: [Mingw-w64-public] Inconsistent behavior of btowc with "C" locale

在 2025-6-3 19:40, Kirill Makurin 写道:
> I think glibc just did not implement this behavior yet. There have been 
> nearly zero real changes to its
> `btowc` since like 2007.
>
> https://sourceware.org/git/?p=glibc.git;a=history;f=wcsmbs/
> btowc.c;h=7be040ff6688d31da585d9075bdee54d231550d1;hb=refs/heads/master 
> <https://sourceware.org/git/?
> p=glibc.git;a=history;f=wcsmbs/btowc.c;h=7be040ff6688d31da585d9075bdee54d231550d1;hb=refs/heads/master>
>
> I think it should be ok for replacement's behavior to match CRT.
>

I agree with that.


--
Best regards,
LIU Hao

_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
From 211c098c9b33c4360f19607a0c8ce8c7733ed003 Mon Sep 17 00:00:00 2001
From: Kirill Makurin <maiddais...@outlook.com>
Date: Fri, 6 Jun 2025 21:15:27 +0900
Subject: [PATCH 1/2] crt: check return value of ___lc_codepage_func() in btowc
 and wctob

When current locale is "C", ___lc_codepage_func() will return 0.
When 0 (CP_ACP) is passed to MultiByteToWideChar or WideCharToMultiByte,
they will use code page returned by GetACP() during conversion.

This may lead to unexpected behavior in programs relying on "C" locale
being consistent.

Check return value of ___lc_codepage_func(), and if it returns 0:

- for btowc, perform simple range check
- for wctob, use code page 20127 (ASCII) to perform conversion

Note: in "C" locale, btowc simply returns passed value if it is in
range [0,255]. This matches CRT's behavior.

Signed-off-by: Kirill Makurin <maiddais...@outlook.com>
---
 mingw-w64-crt/misc/btowc.c | 22 +++++++++++++---------
 mingw-w64-crt/misc/wctob.c | 26 ++++++++++++++++----------
 2 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/mingw-w64-crt/misc/btowc.c b/mingw-w64-crt/misc/btowc.c
index c8fbd8e74..e203bbc44 100644
--- a/mingw-w64-crt/misc/btowc.c
+++ b/mingw-w64-crt/misc/btowc.c
@@ -15,14 +15,18 @@ wint_t btowc (int c)
 {
   if (c == EOF)
     return (WEOF);
-  else
-    {
-      unsigned char ch = c;
-      wchar_t wc = WEOF;
-      if (!MultiByteToWideChar (___lc_codepage_func(), MB_ERR_INVALID_CHARS,
-                                (char*)&ch, 1, &wc, 1))
-        return WEOF;
 
-      return wc;
-    }
+  unsigned cp = ___lc_codepage_func();
+
+  /* "C" locale */
+  if (cp == 0)
+    return (unsigned) c < 0xFF ? c : WEOF;
+
+  unsigned char ch = c;
+  wchar_t wc = WEOF;
+
+  if (!MultiByteToWideChar (cp, MB_ERR_INVALID_CHARS, (char*)&ch, 1, &wc, 1))
+    return WEOF;
+
+  return wc;
 }
diff --git a/mingw-w64-crt/misc/wctob.c b/mingw-w64-crt/misc/wctob.c
index 995f6db6e..eddf9d20b 100644
--- a/mingw-w64-crt/misc/wctob.c
+++ b/mingw-w64-crt/misc/wctob.c
@@ -14,16 +14,22 @@
 #include <windows.h>
 
 /* Return just the first byte after translating to multibyte.  */
-int wctob (wint_t wc )
+int wctob (wint_t wc)
 {
-    wchar_t w = wc;
-    char c;
-    int invalid_char = 0;
-    if (!WideCharToMultiByte (___lc_codepage_func(),
-                             0 /* Is this correct flag? */,
-                             &w, 1, &c, 1, NULL, &invalid_char)
-        || invalid_char)
-      return EOF;
+  unsigned cp = ___lc_codepage_func();
 
-    return (unsigned char) c;
+  /* "C" locale, use code page 20127 (ASCII) for conversion */
+  if (cp == 0)
+    cp = 20127;
+
+  wchar_t w = wc;
+  char c;
+  int invalid_char = 0;
+
+  /* Do not use WC_NO_BEST_FIT_CHARS, CRT's wctob uses best-fit conversion */
+  if (!WideCharToMultiByte (cp, 0, &w, 1, &c, 1, NULL, &invalid_char)
+      || invalid_char)
+    return EOF;
+
+  return (unsigned char) c;
 }
-- 
2.46.1.windows.1

From 8c7d1f645bf4123999b9a65817b5618df7768471 Mon Sep 17 00:00:00 2001
From: Kirill Makurin <maiddais...@outlook.com>
Date: Fri, 6 Jun 2025 21:24:28 +0900
Subject: [PATCH 2/2] crt: emulate btowc and wctob

Use replacements for btowc and wctob only when they are not available in
system's msvcrt.dll.

Signed-off-by: Kirill Makurin <maiddais...@outlook.com>
---
 mingw-w64-crt/misc/btowc.c | 8 +++++++-
 mingw-w64-crt/misc/wctob.c | 8 +++++++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/mingw-w64-crt/misc/btowc.c b/mingw-w64-crt/misc/btowc.c
index e203bbc44..f6cafeb2f 100644
--- a/mingw-w64-crt/misc/btowc.c
+++ b/mingw-w64-crt/misc/btowc.c
@@ -11,7 +11,7 @@
 #include <stdio.h>
 #include <windows.h>
 
-wint_t btowc (int c)
+static wint_t __cdecl emu_btowc (int c)
 {
   if (c == EOF)
     return (WEOF);
@@ -30,3 +30,9 @@ wint_t btowc (int c)
 
   return wc;
 }
+
+#define RETT wint_t
+#define FUNC btowc
+#define ARGS int c
+#define CALL c
+#include "msvcrt_or_emu_glue.h"
diff --git a/mingw-w64-crt/misc/wctob.c b/mingw-w64-crt/misc/wctob.c
index eddf9d20b..32957d52a 100644
--- a/mingw-w64-crt/misc/wctob.c
+++ b/mingw-w64-crt/misc/wctob.c
@@ -14,7 +14,7 @@
 #include <windows.h>
 
 /* Return just the first byte after translating to multibyte.  */
-int wctob (wint_t wc)
+static int __cdecl emu_wctob (wint_t wc)
 {
   unsigned cp = ___lc_codepage_func();
 
@@ -33,3 +33,9 @@ int wctob (wint_t wc)
 
   return (unsigned char) c;
 }
+
+#define RETT int
+#define FUNC wctob
+#define ARGS wint_t wc
+#define CALL wc
+#include "msvcrt_or_emu_glue.h"
-- 
2.46.1.windows.1

_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to