Fixes the CESU-8 value, but not the missing encoding if the high
surrogate is at the very end of the string.
--
Regards,
Christian
From 96f23496f249558949923e60270b9568956912bf Mon Sep 17 00:00:00 2001
From: Christian Franke <christian.fra...@t-online.de>
Date: Sun, 29 Jun 2025 19:03:36 +0200
Subject: [PATCH] wcrtomb: fix CESU-8 value of leftover lone high surrogate
Addresses: https://cygwin.com/pipermail/cygwin/2025-June/258378.html
Fixes: 6ff28fc3b121 ("Allow CESU-8 surrogate value encoding")
Signed-off-by: Christian Franke <christian.fra...@t-online.de>
---
newlib/libc/stdlib/wctomb_r.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/newlib/libc/stdlib/wctomb_r.c b/newlib/libc/stdlib/wctomb_r.c
index 5ea1e13e4..ec6adfa49 100644
--- a/newlib/libc/stdlib/wctomb_r.c
+++ b/newlib/libc/stdlib/wctomb_r.c
@@ -62,8 +62,8 @@ __utf8_wctomb (struct _reent *r,
of the surrogate and proceed to convert the given character. Note
to return extra 3 bytes. */
wchar_t tmp;
- tmp = (state->__value.__wchb[0] << 16 | state->__value.__wchb[1] << 8)
- - (0x10000 >> 10 | 0xd80d);
+ tmp = (((state->__value.__wchb[0] << 16 | state->__value.__wchb[1] << 8)
+ - 0x10000) >> 10) | 0xd800;
*s++ = 0xe0 | ((tmp & 0xf000) >> 12);
*s++ = 0x80 | ((tmp & 0xfc0) >> 6);
*s++ = 0x80 | (tmp & 0x3f);
--
2.45.1