Alex Shishkin schrieb:
1) Why UTF8String made incompatible with AnsiString(CP_UTF8)
( UTF8String = type AnsiString(CP_UTF8); )? Why not an alias?

An alias allows to assign strings of *any* encoding, with possibly fatal consequences. A strict UTF8String type allows for implicit conversion, whenever required, so that such a string can contain nothing but UTF-8 encoded characters.

2) Same question about RawByteString

Variables of type RawByteString have no fixed encoding. Any AnsiString can be assigned to an RawByteString variable, without conversion. But when a RawByteString is assigned to an different string type, an conversion may be necessary.

3) why UnicodeString is separate type? Does it should be AnsiString(CP_UTF16)? If not what is AnsiString(CP_UTF16)?

Delphi only allows for an element size of 1 for AnsiStrings (and RawByteString). The reason is unclear/undocumented. One reason may be the type of str[i], which is an AnsiChar for Ansi encoding, and a WideChar for UTF-16 encoding. Consequently a Char must have 4 bytes, when AnsiStrings of a variable element size would be allowed.

4) If now ansistring can contain text in any supported encoding, I think that this only type is enough to support both single-byte and Delphi`s UTF16.

AnsiString is only a string type with *native* (system) encoding, i.e. type AnsiString(0). UTF8String is already a different type AnsiString(CP_UTF8).

The only need is modeswitch to map string to UTF16-encoded _AnsiString_ (!). There is no need to have two (or more) RTLs (for UTF8 or UTF16 f.e) because encoding info is already included in _any_ longstring.

Right, there exists no *technical* need. But a *practical* need is the reduction of conversions, between strings of different encodings, and the implementation of procedures, that work with strings (e.g. ToUpper).

4.1)However, for speed reasons, internally there should be separate code for quick handle particular encodings to avoid conversions (for [Lower|Upper]Case for example), but interface of RTL units should remain unchanged.

The fastest implementation were an UCS4 string, with no encoding conversions required inside any library. Then conversions are required only in calls of OS or other external library functions.

DoDi

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to