On 2025-09-29 16:52, Mark Geisert via Cygwin wrote:
Hello Thomas,
Apologies for the late response to your report.
On 7/21/2025 8:25 PM, Thomas Wolff via Cygwin wrote:
mbrtowc is broken in 3.6.4 which breaks non-BMP display in mintty.
Test case below.
Thomas
#include <locale.h>
#include <wchar.h>
#include <stdio.h>
void mb(unsigned char c)
{
wchar_t wc;
int ret = mbrtowc(&wc, &c, 1, 0);
printf("%02X -> %04X : %d\n", c, wc, ret);
}
void main ()
{
setlocale (LC_CTYPE, "");
mb(0xF0);
mb(0x9F);
mb(0x98);
mb(0x8E);
}
Running your testcase gives different output between 3.6.4 and 3.7.0-dev-139 but
I'm unsure the latter is correct. Can you comment please?
On 3.6.4:
~ ./a
F0 -> 0000 : -2
9F -> 0000 : -2
98 -> 0000 : -2
8E -> D83D : 3
On 3.7.0-dev-139:
~ ./a
F0 -> 0000 : -2
9F -> 0000 : -2
98 -> D83D : 1
8E -> DE0E : 1
A code point converter I have agrees with the latter under 3.6.4, but the
mbrtowc() return values should not be 1, 1, but 2, 1 possibly; otherwise you
don't know when the character is complete:
$ utf8cp $'\xf0\x9f\x98\x8e' 😎
😎 U+01f60e f0 9f 98 8e d83d de0e
😎 U+01f60e f0 9f 98 8e d83d de0e
expanded:
f0 == 1111 0/000 => 4 bytes - 3 bits 0 00 -- 0
9f == 10/01 1111 => 3 - 6 01 1111 -- 1f
98 == 10/01 1000 => 2 - 6 0110 00 -- 60
hi 10 - 1101 10/00 0011 1101 == d8 3d
8e == 10/00 1110 => 1 - 6 00 1110 -- 0e
lo 10 - 1101 11/10 0000 1110 == de 0e
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retrancher but when there is no more to cut
-- Antoine de Saint-Exupéry
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple