Hi Doug,

Am 02.06.2026 um 20:43 schrieb Doug Henderson via Cygwin:
  On Tue, Jun 2, 2026 at 1:46 AM Thomas Wolff via Cygwin <[email protected]>
wrote:

Thomas Wolff, et. al.,

I have been carefully following this discussion of technical issues, API
conformance, and related issues,.

There has been no significant discussion of the user experience that
depends on the resolution of this problem.

Will non-BMP Unicode code points display correctly in terminal windows
(that use an appropriate font), e.g. mintty?
I noticed the problem when I compiled mintty on a system where I had installed the test release of gcc 16 and various display widths were wrong, actually also within the BMP. So a fix was needed and it turned out that the issue was raised by, as I still think, wrong sign-extension of a 16-bit value to a 32-bit parameter by gcc 16 where previous gcc versions had only applied unsigned bit extension. Well, for subtle reasons, this was not accepted as a bug in gcc, so the cygwin API had to be adjusted. About non-BMP characters, requesting their width with the wcwidth function never used to work (my first plan was to make it work but this was rejected as not standard-conformant and in fact would not have been portable so actually useless in the first place). If a program wants to determine the locale width of a non-BMP character on a UTF-16-based system, it needs to split it into its Unicode surrogates and use the wcswidth function. This was also done by mintty already, so there was no non-BMP-specific problem. This will be handled under-the-hood by mintty, applications can use UTF-8 4-byte sequences towards the terminal of course.

Non-BMP Unicode code points include emojis, mathematical script glyphs, and 
many others.

For me, I care if a small Python script like:

$ type main.py
print("U+01D49E ‹𝒞›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL C")

outputs:
U+01D49E ‹𝒞›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL C
Your test case does not cover the problem. Plain output will always appear properly as long as you don't reach the end of the line. The question is whether for example python has an idea of current cursor position and whether it is consistent with output.
Maybe you can try to evaluate that too.

on Windows Terminal, as expected

and when running cygwin in a mintty window:

$ uname -a
CYGWIN_NT-10.0-26200 mercury 3.6.9-1.x86_64 2026-04-21 15:46 UTC x86_64
Cygwin
$ mintty --version
mintty 3.8.2 (Cygwin-x86_64)
$ date
Jun  2, 2026 11:36:33
$ python3 -V
Python 3.12.12
$ cat main.py
print("U+01D49E ‹𝒞›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL C")
$ python3 main.py
U+01D49E ‹𝒞›  GC=Lu    MATHEMATICAL SCRIPT CAPITAL C

correctly displays the Mathematical Script Capital C glyph as seen in the
attached screen capture.

If the resolution of this problem changes "our" user experience, that will
be another problem.

As an after thought, I may be seeing success due to the byte stream
containing UTF-8 4-byte sequences all the way through Windows 11 APIs,
without any conversions to UTF-16 or UTF-32.

I am not so up-to-date with C as to rattle off a demo in seconds, as I can
in Python. I would like to see such a minimal C demo program that I can try
with cmd in WT, with MINGW64 in mintty, and with cygwin in mintty. The demo
should send UTF-8 4-byte, UTF-16 2-short, and UTF-32 1-long characters, if
possible. TIA.

Just my take,
Doug

-- Doug Henderson, Calgary, Alberta, Canada - from gmail.com



--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to