Re: Merging CoreBase into Base

Stefan Bidi Mon, 12 Aug 2013 08:57:40 -0700

There are a couple of reasons why to use UTF-16:
(1) The CF/Foundation APIs assume UTF-16.  CFStringGetCharacterAtIndex()
and CFStringGetCharacters() would be extremely inefficient for anything
that isn't either ASCII, Latin1 or UTF-16.  Just look at what base has to
do to support UTF-8.  It traverses through the whole string every time you
call -characterAtIndex:.
(2) Almost all ICU APIs use UTF-16.

To address your concern about endianness, I don't think this is a problem
at all.  The API to the outside world is still the same.  We store all
strings in the host endianness and export them with the BOM if
isExternalRepresentation is specified.

I can't use libc functions on almost anything except the most basic string
functions.  Not even printf can be used because of the %@ specifier.

On Mon, Aug 12, 2013 at 10:31 AM, David Chisnall <
[email protected]> wrote:

> On 12 Aug 2013, at 16:26, Stefan Bidi <[email protected]> wrote:
>
> > (2) I'm working towards making corebase use Unicode (ie UTF-16)
> internally wherever possible. I believe this is a saner choice than trying
> to deal with UTF-8.
>
> I find this an odd observation.  UTF-16 is multibyte, so comes with all of
> the same pain as UTF-8, but has the disadvantage that it's almost always
> larger than UTF-16 (most two-byte characters in UTF-16 are also two-byte
> characters in UTF-16).  You also start hitting endian issues with UTF-16,
> whereas UTF-8 is endian-independent.  Finally, UTF-8 is the format that you
> typically want for input or output, as it's well supported by most libc
> functions, terminals, and so on.
>
> David
>
>

_______________________________________________
Gnustep-dev mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/gnustep-dev

Re: Merging CoreBase into Base

Reply via email to