Re: Multibyte support (round 2)

Eric Blake Mon, 29 Aug 2016 10:13:42 -0700

On 08/27/2016 12:05 AM, Assaf Gordon wrote:

> Regarding wchar_t == UCS:


> And so, the question becomes:
> When the locale is "UTF-8", is the internal representation of 'wchar_t'
> identical to UCS2 or UCS4 (i.e. unicode code-points).
> While the standard explicitly says this can not be assumed,
> I think in practice it is always the case.
> 
> It is so in glibc and musl-libc,
> and in OpenBSD,FreeBSD,NetBSD with "UTF-8" locales (but not in non-utf8 
> locales).

But not in Cygwin, where wchar_t is 2 bytes, and where Cygwin already
supports surrogate pairs in wchar_t to represent Unicode characters
beyond 0xffff (such a representation is a violation of the POSIX
definition of wchar_t, which is supposed to encode every possible
character via a single code point, but it was deemed a better solution
than limiting Cygwin to only the BMP characters, and only affects code
that is explicitly using characters outside BMP).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: Multibyte support (round 2)

Reply via email to