Graeme Geldenhuys schrieb:
Op 2011-02-15 16:32, Hans-Peter Diettrich het geskryf:
You realize the problems, that may result from the different char type
of such an target-specific string type?

Please do share your thoughts...

Most people have been sure, in the past, that they use a SBCS, where every character on screen is a char in memory. And consequently they use indexed access to the chars in an string, and for...to loops. The same procedures may work for UTF-16, where also most characters correspond to one widechar, but this code will fail miserably on an UTF-8 platform, where every single (visual) character can consist of any number of chars, with no compiler warnings.

That's one reason why I think that it should be disallowed, in portable code, to use any char type together with strings. Such restrictions cannot be applied to specific string types, unless these are strictly different from the old ShortStrings and AnsiStrings.

It would be nice, of course, for old style code, to have strings with a known (app specific) and *immutable* encoding. String handling with such a target independent string type would work properly on any target, as long as the contents match the coder's expectations. In Cobol such strings were for "usage computational", in constrast to "usage display" with target specific encoding.


I must add, that I would be very surprised if Embarcadero doesn't use
native encoded string types for the "unicode string" support in the
upcoming Delphi under Windows (UTF-16), Linux (UTF-8), Mac (UTF-8) etc..
I'm not 100% sure about the default Mac encoding, but seeing that it
comes from FreeBSD, I would guess UTF-8 there too.

AFAIK the UnicodeString allows for any dynamic encoding, be SBCS, MBCS or UTF-8/16. The element (char) size and encoding have become part of every Unicode string descriptor.


As for saving text to file...It is universally known to use UTF-8 in
such cases, because UTF-8 is the perfect encoding for streaming. Hence
the W3C also said all HTML, XML etc should be preferably in UTF-8.

Right, UTF-8 is the recommended external representation of text. No byte order problems, no conversion losses...

DoDi


--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Reply via email to