Leif Ekblad wrote:
IMO, I wouldn't support wide-character (UnicodeString) strings for
anything new.
In the beginning the wide-character string had the advantage of being
able to represent
all characters with 2 bytes, but this is no longer the case. I would
switch to UTF-8
instead and keep characters 1 byte long. A switch to UTF-8 only affects a
small amount of the code-base, and doesn't break string references.
UTF-8 is fine for external representation, and for code that's near it.
After all, it's merely a form of compression in the same way that HTTP
etc. uses compression for content.
I think your point about two bytes now being insufficient to represent
all possible Unicode codepoints is valid, but since things like
expression parsing are made much more efficient by being able to iterate
an array that's an argument for moving to a wider internal
representation- not a narrower one.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk
[Opinions above are the author's, not those of his employers or colleagues]
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel