On Sat, Apr 7, 2018, 10:49 Richard Frith-Macdonald < richard.frith-macdon...@theengagehub.com> wrote:
> > > > On 7 Apr 2018, at 10:21, Ivan Vučica <i...@vucica.net> wrote: > > > > On Sat, Apr 7, 2018, 09:50 David Chisnall <gnus...@theravensnest.org> > wrote: > > > > > > My current plan is to make the format support ASCII, UTF-8, UTF-16, and > UTF-32, but only generate ASCII and UTF-16 in the compiler and then decide > later if we want to support generating UTF-8 and UTF-32. I also won’t > initialise the hash in the compiler initially, until we’ve decided a bit > more what the hash should be. > > > > Emojis don't fit UTF-16. Even if one dismisses CJK, ancient scripts etc, > constant strings are not absolutely unlikely to contain emojis. > > > > Not supporting UTF-8 for internal storage may be reasonable, but not > supporting UTF-32 for strings that require it seems like a bug. > > Everything fits in UTF-16 (or UTF-8 for that matter). However it's true > that many/most emojis don't fit in a *single* 16bit value and require two > UTF-16 (or multiple 8bit UTF-8 values) to encode them. > Since the NSString APIs assume a 16bit character width, that means an > emoji will generally be treated as two characters as far as they are > concerned, but that's not really a problem and current gnustep-base > can/does work for emojis (for instance, sending UTF16 to mobile phones). > Acknowledged. I guess I never looked up the representation of characters with codepoints >64k in UTF-16. Thanks to both for clarification! >
_______________________________________________ Gnustep-dev mailing list Gnustep-dev@gnu.org https://lists.gnu.org/mailman/listinfo/gnustep-dev