Re: why is MB_LEN_MAX so large (16) on glibc

Eric Blake Wed, 13 May 2015 18:30:20 -0700

On 05/13/2015 06:30 PM, Bruno Haible wrote:

> The value of 4 is sufficient to accommodate all stateless encodings in
> use, including UTF-8 (which was restricted from max. 6 to 4 bytes by
> an ISO standard) and GB18030. But it's not necessarily future-proof.
> 
>> I was worried that it implied that wctomb() might convert a wide char to 
>> _multiple_ encoded chars
>> for some character/encoding combinations?


On Cygwin, where wchar_t is 2 bytes, we have the opposite problem - any
character not in the basic plane of Unicode (that is, > 0xffff) requires
two surrogate pair wchar_t to represent a single character; which
violates the POSIX premise that wchar_t holds a character. It makes for
some odd behavior with wctomb() and friends, but it's the best that can
be done.

If the C11 char16_t and char32_t take off (with the according explosion
in function interfaces), then switching the world to char32_t instead of
wchar_t would be the sane approach for dealing with wide characters.
But I don't know if that is likely to happen.


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: why is MB_LEN_MAX so large (16) on glibc

Reply via email to