On Tuesday 10 November 2009 10:33:07 Florian Klaempfl wrote: > > > > So please don't destroy this ideal solution by dropping current FPC > > UnicodeString in favour of the Delphi string which is complicated, > > Who says that? If you don't mess with code pages, the only different > you'll might see is that UnicodeString gets two new fields: encoding and > char size. However, this information is usually only used if you pass > the string to a RawString parameters. Normal Unicodestring routines > initialize these fields and that's it. >
I can confirm there is not much overhead for the new UnicodeString. I was mislead by the Delphi {$stringchecks on} option and a misinterpreted comment from a FPC developer that it is not possible to check codepage compatibility at compiletime, sorry for that. Some guesswork gained form my experiments with the cpstrnew branch, Win32, Russian locale, source in utf-8, {$codepage utf8}, please correct me if I am wrong: UnicodeString - always utf-16 encoded. - str:= 'abc'; length(str) = 6, stringcodepage(str) = 1200. - str:= 'abä'; length(str) = 6, stringcodepage(str) = 1200. - no encoding checks by concanteation, concatenation does not work because of wrong length() value. - setlength() of empty string creates CP 1200. UTF8String - str:= 'abc'; length(str) = 3, stringcodepage(str) = 65001. - str:= 'abä'; length(str) = 4, stringcodepage(str) = 65001. Runtime widestringmanager.Wide2AnsiMoveProc(). - encoding checked by concatenation. - utf8string:= utf8string + '123' needs conversion to UnicodeString and back. - setlength() of empty string creates CP 1251. String<1251> - str:= 'abc'; length(str) = 3, stringcodepage(str) = 65001. - str:= 'abä'; length(str) = 3, stringcodepage(str) = 1251. Runtime widestringmanager.Wide2AnsiMoveProc(). - str:= 'abc'; str:= str + '123'; needs conversion to UnicodeString and back. - setlength() of empty string creates CP 1251. AnsiString - str:= 'abc'; length(str) = 3, stringcodepage(str) = 65001. - str:= 'abä'; length(str) = 0, stringcodepage(str) = 1251. Runtime widestringmanager.Wide2AnsiMoveProc(). - str:= 'abc'; str:= str + '123'; needs conversion to UnicodeString and back. - setlength() of empty string creates CP 1251. RawByteString - str:= 'abc'; length(str) = 3, stringcodepage(str) = 65001. - str:= 'abä'; length(str) = 0, stringcodepage(str) = 1251. Runtime widestringmanager.Wide2AnsiMoveProc(). - str:= 'abc'; str:= str + '123'; needs conversion to UnicodeString and back. - setlength() of empty string creates CP 1251. - utf8str1:= 'abc'; cp1251str1:= utf8str1; needs conversion to UnicodeString and back. - utf8str1:= 'abc'; ansistr1:= utf8str1; no conversion. CP ansistr1 = 65001. - ansistr1:= 'abc'; utf8str1:= ansistr1; no conversion. CP utf8str1 = 1251. What are the differences of AnsiString and RawByteString? Please report when you think cpstrnew branch is stable enough to be tested with MSEgui. Thanks, Martin _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel