On 7 Apr 2018, at 10:21, Ivan Vučica <i...@vucica.net> wrote: > > On Sat, Apr 7, 2018, 09:50 David Chisnall <gnus...@theravensnest.org> wrote: > > > My current plan is to make the format support ASCII, UTF-8, UTF-16, and > UTF-32, but only generate ASCII and UTF-16 in the compiler and then decide > later if we want to support generating UTF-8 and UTF-32. I also won’t > initialise the hash in the compiler initially, until we’ve decided a bit more > what the hash should be. > > Emojis don't fit UTF-16. Even if one dismisses CJK, ancient scripts etc, > constant strings are not absolutely unlikely to contain emojis. > > Not supporting UTF-8 for internal storage may be reasonable, but not > supporting UTF-32 for strings that require it seems like a bug.
UTF-32 is not more expressive than UTF-16, and it’s not even more efficient than UTF-16 (all unicode characters can be expressed in either one or two UTF-16 characters, so in the worst case you need the same number of bytes to express a unicode character in UTF-16 and in the best case you need half as many). The only advantage that UTF-32 has is of being a fixed-length encoding, but that isn’t actually very helpful when the APIs all refer to UTF-16 code units (and UTF-32 is not a fixed-length encoding of UTF-16 code units). David _______________________________________________ Gnustep-dev mailing list Gnustep-dev@gnu.org https://lists.gnu.org/mailman/listinfo/gnustep-dev