12.10.2011 16:34, Hans-Peter Diettrich пишет:
Alex Shishkin schrieb:
1) Why UTF8String made incompatible with AnsiString(CP_UTF8)
( UTF8String = type AnsiString(CP_UTF8); )? Why not an alias?

An alias allows to assign strings of *any* encoding, with possibly fatal
consequences. A strict UTF8String type allows for implicit conversion,
whenever required, so that such a string can contain nothing but UTF-8
encoded characters.
So if I declare "MyString : AnsiString(CP_UTF8)" and assign win1251 encoded string to it no conversion will be made? It`s strange.


3) why UnicodeString is separate type? Does it should be
AnsiString(CP_UTF16)? If not what is AnsiString(CP_UTF16)?

Delphi only allows for an element size of 1 for AnsiStrings (and
RawByteString). The reason is unclear/undocumented. One reason may be
the type of str[i], which is an AnsiChar for Ansi encoding, and a
WideChar for UTF-16 encoding. Consequently a Char must have 4 bytes,
when AnsiStrings of a variable element size would be allowed.



4) If now ansistring can contain text in any supported encoding, I
think that this only type is enough to support both single-byte and
Delphi`s UTF16.

AnsiString is only a string type with *native* (system) encoding, i.e.
type AnsiString(0). UTF8String is already a different type
AnsiString(CP_UTF8).

The only need is modeswitch to map string to UTF16-encoded
_AnsiString_ (!). There is no need to have two (or more) RTLs (for
UTF8 or UTF16 f.e) because encoding info is already included in _any_
longstring.

Right, there exists no *technical* need. But a *practical* need is the
reduction of conversions, between strings of different encodings, and
the implementation of procedures, that work with strings (e.g. ToUpper).

procedures, that work with strings should use RawByteStrings, and fpc unlike delphi might allow UTF8 or UTF16 for RawByteStrings.

4.1)However, for speed reasons, internally there should be separate
code for quick handle particular encodings to avoid conversions (for
[Lower|Upper]Case for example), but interface of RTL units should
remain unchanged.

The fastest implementation were an UCS4 string, with no encoding
conversions required inside any library. Then conversions are required
only in calls of OS or other external library functions.

But they consume significantly more memory.

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to