On Fri, Nov 10, 2023 at 08:47:10AM +0200, Eli Zaretskii wrote: > > Does anybody know if we could just write 'a' instead of U'a' and rely > > on it being converted? > > > > E.g. if you do > > > > char32_t c = 'a'; > > > > then afterwards, c should be equal to 97 (ASCII value of 'a'). > > Why not? What could be the problems with using this?
I think what was confusing me was the statement that char32_t held a UTF-32 encoded Unicode character. I then thought it would have a certain byte order, so if the UTF-32 was big endian, the bytes would have the order 00 00 00 61, whereas the value 97 on a little endian machine would have the order 61 00 00 00. However, it seems that UTF-32 just means the codepoint is encoded as a 32-bit integer, and the endianness of the UTF-32 sequence can be assumed to match the endianness of the machine. The standard C integer conversions can be assumed to work when assigning to/from char32_t because it is just an integer type, I assume.
