On 05/13/2015 06:30 PM, Bruno Haible wrote: > The value of 4 is sufficient to accommodate all stateless encodings in > use, including UTF-8 (which was restricted from max. 6 to 4 bytes by > an ISO standard) and GB18030. But it's not necessarily future-proof. > >> I was worried that it implied that wctomb() might convert a wide char to >> _multiple_ encoded chars >> for some character/encoding combinations?
On Cygwin, where wchar_t is 2 bytes, we have the opposite problem - any character not in the basic plane of Unicode (that is, > 0xffff) requires two surrogate pair wchar_t to represent a single character; which violates the POSIX premise that wchar_t holds a character. It makes for some odd behavior with wctomb() and friends, but it's the best that can be done. If the C11 char16_t and char32_t take off (with the according explosion in function interfaces), then switching the world to char32_t instead of wchar_t would be the sane approach for dealing with wide characters. But I don't know if that is likely to happen. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
