There are a couple of reasons why to use UTF-16: (1) The CF/Foundation APIs assume UTF-16. CFStringGetCharacterAtIndex() and CFStringGetCharacters() would be extremely inefficient for anything that isn't either ASCII, Latin1 or UTF-16. Just look at what base has to do to support UTF-8. It traverses through the whole string every time you call -characterAtIndex:. (2) Almost all ICU APIs use UTF-16.
To address your concern about endianness, I don't think this is a problem at all. The API to the outside world is still the same. We store all strings in the host endianness and export them with the BOM if isExternalRepresentation is specified. I can't use libc functions on almost anything except the most basic string functions. Not even printf can be used because of the %@ specifier. On Mon, Aug 12, 2013 at 10:31 AM, David Chisnall < [email protected]> wrote: > On 12 Aug 2013, at 16:26, Stefan Bidi <[email protected]> wrote: > > > (2) I'm working towards making corebase use Unicode (ie UTF-16) > internally wherever possible. I believe this is a saner choice than trying > to deal with UTF-8. > > I find this an odd observation. UTF-16 is multibyte, so comes with all of > the same pain as UTF-8, but has the disadvantage that it's almost always > larger than UTF-16 (most two-byte characters in UTF-16 are also two-byte > characters in UTF-16). You also start hitting endian issues with UTF-16, > whereas UTF-8 is endian-independent. Finally, UTF-8 is the format that you > typically want for input or output, as it's well supported by most libc > functions, terminals, and so on. > > David > >
_______________________________________________ Gnustep-dev mailing list [email protected] https://lists.gnu.org/mailman/listinfo/gnustep-dev
