Am 01.06.2026 um 17:34 schrieb Jakob Bohm via Cygwin:
Dear list,

Having read through the recent debate around the wcwidth() POSIX API,
wchar_t definitions, gcc-16 and cygwin, I have an idea not
mentioned in the list so far:

Using C17 types char32_t and char16_t, the situation can be
summarized as follows:

- Many, but not all POSIX systems define wchar_t as char32_t and thus
wint_t as uint_least32_t

- Win32 and thus Cygwin defines wchar_t as char16_t and thus wint_t as
uint_least16_t

- All systems considered treat wchar_t as unicode, with Win32 supporting
 UTF-16 since the NT 5.00 (Windows 2000).

- For char16_t/UTF-16, wcwidth() should use the high surrogate to
 determine the range of unicode symbols and return a width common to
 that range, then return 0 for the low surrogates, thereby allowing
 computation of string width without having to first assemble surrogates
 into full char32_t values.  Deciding if char32_t implementations should
 still lump groups of 4 Unicode rows for UTF-16 compatibility is up to
 each implementation.
It's a neat idea to split the width calculation over the surrogates. Unfortunately it does not work this way because widthness does not change in full 1024-byte blocks. For example, U+1F4FC is Wide, U+1F4FD and U+1F4FE are narrow/Neutral (N), and U+1F4FF is W again. As a variant of your idea, wcwidth could return width 1 for every high surrogate, remember it, and if the subsequent invocation is a low surrogate, determine the combined width and return either 1 or 0. Not quite standard behaviour, I suspect, so maybe not a good idea for the purists, but maybe worth some discussion.


A practical solution would be for Cygwin/newlib to provide new functions
c16width(), c32width(), c16swidth() and c32swidth(), each being the
explicit size equivalants of their wc and wcs similarly named functions.

Then wcwidth() can be a trivial inline alias of the explicit size
equivalent for the compile target by having the newlib header checking a
compiler or standard define indicating the chosen size of wchar_t.

// possible wchar.h snippet
//
// C17+ required
// For C2Y+ this should go in uchar.h
//
int c16width(char16_t c);
int c32width(char32_t c);
int c16swidth(const char16_t *s, size_t n);
int c32swidth(const char32_t *s, size_t n);

// ...

// This belongs in wchar.h for C1x- compat
//
#if SOMETHING_MEANING_16bit_WCHAR_T
inline int wcwidth(wchar_t c) {
  return c16width(c);
}
inline int wcswidth(const wchar_t *s, size_t n)
{
  return c16swidth(s, n);
}
#else
inline int wcwidth(wchar_t c) {
  return c32width(c);
}
inline int wcswidth(const wchar_t *s, size_t n)
{
  return c32swidth(s, n);
}
#endif


Enjoy

Jakob


--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to