Fixes the CESU-8 value, but not the missing encoding if the high surrogate is at the very end of the string.

--
Regards,
Christian

From 96f23496f249558949923e60270b9568956912bf Mon Sep 17 00:00:00 2001
From: Christian Franke <christian.fra...@t-online.de>
Date: Sun, 29 Jun 2025 19:03:36 +0200
Subject: [PATCH] wcrtomb: fix CESU-8 value of leftover lone high surrogate

Addresses: https://cygwin.com/pipermail/cygwin/2025-June/258378.html
Fixes: 6ff28fc3b121 ("Allow CESU-8 surrogate value encoding")
Signed-off-by: Christian Franke <christian.fra...@t-online.de>
---
 newlib/libc/stdlib/wctomb_r.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/newlib/libc/stdlib/wctomb_r.c b/newlib/libc/stdlib/wctomb_r.c
index 5ea1e13e4..ec6adfa49 100644
--- a/newlib/libc/stdlib/wctomb_r.c
+++ b/newlib/libc/stdlib/wctomb_r.c
@@ -62,8 +62,8 @@ __utf8_wctomb (struct _reent *r,
         of the surrogate and proceed to convert the given character.  Note
         to return extra 3 bytes. */
       wchar_t tmp;
-      tmp = (state->__value.__wchb[0] << 16 | state->__value.__wchb[1] << 8)
-           - (0x10000 >> 10 | 0xd80d);
+      tmp = (((state->__value.__wchb[0] << 16 | state->__value.__wchb[1] << 8)
+           - 0x10000) >> 10) | 0xd800;
       *s++ = 0xe0 | ((tmp & 0xf000) >> 12);
       *s++ = 0x80 | ((tmp &  0xfc0) >> 6);
       *s++ = 0x80 |  (tmp &   0x3f);
-- 
2.45.1

Reply via email to