[GitHub] [incubator-nuttx] yamt commented on pull request #4754: Revert "sim: Specify -fshort-wchar as NuttX wchar_t is 16-bit"

GitBox Sun, 31 Oct 2021 20:14:13 -0700


yamt commented on pull request #4754:
URL: https://github.com/apache/incubator-nuttx/pull/4754#issuecomment-955891488



   > From https://pubs.opengroup.org/onlinepubs/007908799/xsh/stddef.h.html:
   > 
   > ```
   > wchar_t
   > Integral type whose range of values can represent distinct wide-character 
codes for all members of the largest character set specified among the locales 
supported by the compilation environment: the null character has the code value 
0 and each member of the Portable Character Set has a code value equal to its 
value when used as the lone character in an integer character constant.
   > ```
   > 
   > wchar_t require to save all possible character encoding. And from 
https://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html:
   > 
   > ```
   > Data type: wchar_t
   > This data type is used as the base type for wide character strings. In 
other words, arrays of objects of this type are the equivalent of char[] for 
multibyte character strings. The type is defined in stddef.h.
   > 
   > The ISO C90 standard, where wchar_t was introduced, does not say anything 
specific about the representation. It only requires that this type is capable 
of storing all elements of the basic character set. Therefore it would be 
legitimate to define wchar_t as char, which might make sense for embedded 
systems.
   > 
   > But in the GNU C Library wchar_t is always 32 bits wide and, therefore, 
capable of representing all UCS-4 values and, therefore, covering all of ISO 
10646. Some Unix systems define wchar_t as a 16-bit type and thereby follow 
Unicode very strictly. This definition is perfectly fine with the standard, but 
it also means that to represent all characters from Unicode and ISO 10646 one 
has to use UTF-16 surrogate characters, which is in fact a multi-wide-character 
encoding. But resorting to multi-wide-character encoding contradicts the 
purpose of the wchar_t type.
   > ```
   > 
   > To cover all possible Unicode encoding, wchar_t require at least 4 bytes.
   > 
   > So, from the standard perspective, this patch should be revert too.
   
   no standard mandates unicode or how it's implemented as far as i know.
   `__STDC_ISO_10646__` is optional.
   i feel it makes sense only if we are going to implement unicode wchar_t. are 
we?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-nuttx] yamt commented on pull request #4754: Revert "sim: Specify -fshort-wchar as NuttX wchar_t is 16-bit"

Reply via email to