Michael Schnell schrieb:
When I have a variable of type AnsiString, and assign an string to it,
then its encoding is reported as 1252 (my system codepage). On Paul's
machine it will have a different encoding, I assume?
Via personal consulting ( :) ) I learned that the multiple new Pascal -
string - types just are a kind of syntax-candy for an underlying common
dynamically typed (and functioning in that way) string type. Seemingly
when allocated theses strings get an appropriate encoding ID that is
effective even with a zero length.
The encoding is associated with string types, and every variable knows
its type. I.e. we have a static encoding, associated with string types
and variables, and a dynamic encoding of string data. Similar to the
static and dynamic types of object references.
Seemingly (other than I assumed) a " := " between new strings does not
preserve the encoding, but performs an encoding conversion to the
target's encoding ID.
Right. The encoding etc., as stored in the string header, is used while
processing strings, e.g. in expressions. In the assignment to a variable
the static encoding of that variable must be compared with the dynamic
encoding of the string data, and a conversion must be performed whenever
required.
So for preventing a conversion, you need to make sure that the target
has the same (or a compatible) encoding ID as the source. (Either by
using the appropriate string types
Right, the new string types are *strict* types, declared as
type UTF8String = type AnsiString(65001);
Note the second "type", denoting an new type, not an alias as in the old
declaration of
type UTF8String = AnsiString;
(hoping the the encoding ID has not
been changed ) or by using SetCodePage.)
SetCodePage is applicable only to RawByteString, because this static
type is compatible with all dynamic types - like TObject is compatible
with all derived classes.
I suppose there also is a
function that is done to do a "pure" code-ID preserving assignment.
Quite unlikely, this defeats the idea of static typing. Low-level
hacking is possible, of course, but the effects are unpredictable. The
compiler assumes that the dynamic encoding matches the static one, and
generates according code.
I suppose a variable of the type "String" is pre-loaded with the
predefined "System" encoding ID.
No, empty strings still are Nil pointers.
DoDi
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel