Re: wcwidth broken with gcc 16

Thomas Wolff via Cygwin Fri, 12 Jun 2026 03:16:09 -0700

Hi Doug,

Am 02.06.2026 um 20:43 schrieb Doug Henderson via Cygwin:

  On Tue, Jun 2, 2026 at 1:46 AM Thomas Wolff via Cygwin <[email protected]>
wrote:


Thomas Wolff, et. al.,

I have been carefully following this discussion of technical issues, API
conformance, and related issues,.

There has been no significant discussion of the user experience that
depends on the resolution of this problem.

Will non-BMP Unicode code points display correctly in terminal windows
(that use an appropriate font), e.g. mintty?

I noticed the problem when I compiled mintty on a system where I hadinstalled the test release of gcc 16 and various display widths werewrong, actually also within the BMP. So a fix was needed and it turnedout that the issue was raised by, as I still think, wrong sign-extensionof a 16-bit value to a 32-bit parameter by gcc 16 where previous gccversions had only applied unsigned bit extension. Well, for subtlereasons, this was not accepted as a bug in gcc, so the cygwin API had tobe adjusted.About non-BMP characters, requesting their width with the wcwidthfunction never used to work (my first plan was to make it work but thiswas rejected as not standard-conformant and in fact would not have beenportable so actually useless in the first place). If a program wants todetermine the locale width of a non-BMP character on a UTF-16-basedsystem, it needs to split it into its Unicode surrogates and use thewcswidth function. This was also done by mintty already, so there was nonon-BMP-specific problem.This will be handled under-the-hood by mintty, applications can useUTF-8 4-byte sequences towards the terminal of course.

Non-BMP Unicode code points include emojis, mathematical script glyphs, and 
many others.

For me, I care if a small Python script like:

$ type main.py
print("U+01D49E ‹𝒞›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL C")

outputs:
U+01D49E ‹𝒞›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL C

Your test case does not cover the problem. Plain output will alwaysappear properly as long as you don't reach the end of the line.The question is whether for example python has an idea of current cursorposition and whether it is consistent with output.

Maybe you can try to evaluate that too.

on Windows Terminal, as expected

and when running cygwin in a mintty window:

$ uname -a
CYGWIN_NT-10.0-26200 mercury 3.6.9-1.x86_64 2026-04-21 15:46 UTC x86_64
Cygwin
$ mintty --version
mintty 3.8.2 (Cygwin-x86_64)
$ date
Jun  2, 2026 11:36:33
$ python3 -V
Python 3.12.12
$ cat main.py
print("U+01D49E ‹𝒞›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL C")
$ python3 main.py
U+01D49E ‹𝒞›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL C

correctly displays the Mathematical Script Capital C glyph as seen in the
attached screen capture.

If the resolution of this problem changes "our" user experience, that will
be another problem.

As an after thought, I may be seeing success due to the byte stream
containing UTF-8 4-byte sequences all the way through Windows 11 APIs,
without any conversions to UTF-16 or UTF-32.

I am not so up-to-date with C as to rattle off a demo in seconds, as I can
in Python. I would like to see such a minimal C demo program that I can try
with cmd in WT, with MINGW64 in mintty, and with cygwin in mintty. The demo
should send UTF-8 4-byte, UTF-16 2-short, and UTF-32 1-long characters, if
possible. TIA.

Just my take,
Doug

-- Doug Henderson, Calgary, Alberta, Canada - from gmail.com



--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Re: wcwidth broken with gcc 16

Reply via email to