I think at most two are required for any target: unicodestring (D2009 
compatibility), and if really necessary because somehow the unicodestring 
version causes too much overhead, an ansistring($ffff) version as well. That's 
only for the classes though, I think most of the base RTL can be simply 
ansistring($ffff).
So if I understand correctly, then UnicodeString and also AnsiString types must "be extended" that they will hold also information about actual codepage (encoding) of string data they hold. (AFAIK ATM they hold only information about "reference count" and "size" and of course "data")

I am not expert, so I do not understand all aspect/problems which are joined with proper string handling, but some kind of implicit conversions (based on actual encoding of string data) is necessary (ANSI <-> UTF-8 <-> UTF-16 <-> ANSI ... etc.).

For example known problem with Euro currency symbol. In Windows is in CurrencyString global variable stored using ANSI codepage, but used in LCL (which expect UTF-8 encoding) without any explicit conversion, what leads to displayng "?" instead of "€" (for example in TDBEdit or TDBGrid)

Another problem when displaying character data in data-aware database controls (TDBEdit, TDBGrid). Data-aware controls (LCL) reads data from TField descendatns (FCL) using TField.Text property which returns "string" (without codepage information is not clear if it is AnsiString or UTF8String or UnicodeString). LCL expect UTF-8 strings, but it is not true in all cases (for example in case of ODBC)

-Laco.
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to