On 08/27/2016 12:05 AM, Assaf Gordon wrote: > Regarding wchar_t == UCS:
> And so, the question becomes: > When the locale is "UTF-8", is the internal representation of 'wchar_t' > identical to UCS2 or UCS4 (i.e. unicode code-points). > While the standard explicitly says this can not be assumed, > I think in practice it is always the case. > > It is so in glibc and musl-libc, > and in OpenBSD,FreeBSD,NetBSD with "UTF-8" locales (but not in non-utf8 > locales). But not in Cygwin, where wchar_t is 2 bytes, and where Cygwin already supports surrogate pairs in wchar_t to represent Unicode characters beyond 0xffff (such a representation is a violation of the POSIX definition of wchar_t, which is supposed to encode every possible character via a single code point, but it was deemed a better solution than limiting Cygwin to only the BMP characters, and only affects code that is explicitly using characters outside BMP). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
