Am 01.06.2026 um 17:34 schrieb Jakob Bohm via Cygwin:
Dear list,
Having read through the recent debate around the wcwidth() POSIX API,
wchar_t definitions, gcc-16 and cygwin, I have an idea not
mentioned in the list so far:
Using C17 types char32_t and char16_t, the situation can be
summarized as follows:
- Many, but not all POSIX systems define wchar_t as char32_t and thus
wint_t as uint_least32_t
- Win32 and thus Cygwin defines wchar_t as char16_t and thus wint_t as
uint_least16_t
- All systems considered treat wchar_t as unicode, with Win32 supporting
UTF-16 since the NT 5.00 (Windows 2000).
- For char16_t/UTF-16, wcwidth() should use the high surrogate to
determine the range of unicode symbols and return a width common to
that range, then return 0 for the low surrogates, thereby allowing
computation of string width without having to first assemble surrogates
into full char32_t values. Deciding if char32_t implementations should
still lump groups of 4 Unicode rows for UTF-16 compatibility is up to
each implementation.
It's a neat idea to split the width calculation over the surrogates.
Unfortunately it does not work this way because widthness does not
change in full 1024-byte blocks. For example, U+1F4FC is Wide, U+1F4FD
and U+1F4FE are narrow/Neutral (N), and U+1F4FF is W again.
As a variant of your idea, wcwidth could return width 1 for every high
surrogate, remember it, and if the subsequent invocation is a low
surrogate, determine the combined width and return either 1 or 0.
Not quite standard behaviour, I suspect, so maybe not a good idea for
the purists, but maybe worth some discussion.
A practical solution would be for Cygwin/newlib to provide new functions
c16width(), c32width(), c16swidth() and c32swidth(), each being the
explicit size equivalants of their wc and wcs similarly named functions.
Then wcwidth() can be a trivial inline alias of the explicit size
equivalent for the compile target by having the newlib header checking a
compiler or standard define indicating the chosen size of wchar_t.
// possible wchar.h snippet
//
// C17+ required
// For C2Y+ this should go in uchar.h
//
int c16width(char16_t c);
int c32width(char32_t c);
int c16swidth(const char16_t *s, size_t n);
int c32swidth(const char32_t *s, size_t n);
// ...
// This belongs in wchar.h for C1x- compat
//
#if SOMETHING_MEANING_16bit_WCHAR_T
inline int wcwidth(wchar_t c) {
return c16width(c);
}
inline int wcswidth(const wchar_t *s, size_t n)
{
return c16swidth(s, n);
}
#else
inline int wcwidth(wchar_t c) {
return c32width(c);
}
inline int wcswidth(const wchar_t *s, size_t n)
{
return c32swidth(s, n);
}
#endif
Enjoy
Jakob
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple