yamt commented on pull request #4754: URL: https://github.com/apache/incubator-nuttx/pull/4754#issuecomment-955891488
> From https://pubs.opengroup.org/onlinepubs/007908799/xsh/stddef.h.html: > > ``` > wchar_t > Integral type whose range of values can represent distinct wide-character codes for all members of the largest character set specified among the locales supported by the compilation environment: the null character has the code value 0 and each member of the Portable Character Set has a code value equal to its value when used as the lone character in an integer character constant. > ``` > > wchar_t require to save all possible character encoding. And from https://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html: > > ``` > Data type: wchar_t > This data type is used as the base type for wide character strings. In other words, arrays of objects of this type are the equivalent of char[] for multibyte character strings. The type is defined in stddef.h. > > The ISO C90 standard, where wchar_t was introduced, does not say anything specific about the representation. It only requires that this type is capable of storing all elements of the basic character set. Therefore it would be legitimate to define wchar_t as char, which might make sense for embedded systems. > > But in the GNU C Library wchar_t is always 32 bits wide and, therefore, capable of representing all UCS-4 values and, therefore, covering all of ISO 10646. Some Unix systems define wchar_t as a 16-bit type and thereby follow Unicode very strictly. This definition is perfectly fine with the standard, but it also means that to represent all characters from Unicode and ISO 10646 one has to use UTF-16 surrogate characters, which is in fact a multi-wide-character encoding. But resorting to multi-wide-character encoding contradicts the purpose of the wchar_t type. > ``` > > To cover all possible Unicode encoding, wchar_t require at least 4 bytes. > > So, from the standard perspective, this patch should be revert too. no standard mandates unicode or how it's implemented as far as i know. `__STDC_ISO_10646__` is optional. i feel it makes sense only if we are going to implement unicode wchar_t. are we? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@nuttx.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org