On 28.04.2019 12:35, Jonas Maebe wrote:
On 28/04/2019 09:55, Ondrej Pokorny wrote:
It's probably what Delphi does as well. The result is that the
refcount of a string after such an assignment is currently always one.
Thanks for the answer. Yes, Delphi does the same. But strings have
copy-on-write so the refcount value doesn't really matter - unless you
change the resulting string via a PChar. Btw. changing a PAnsiChar/PChar
results in a SIGSEGV in FPC but is OK in Delphi. Could you explain this?
program AnsiUtf8;
{$ifdef fpc}{$mode delphi}{$else}{$apptype console}{$endif}
var
Utf8Str: UTF8String;
Str: AnsiString;
P: PAnsiChar;
begin
DefaultSystemCodePage := 65001;
Utf8Str := 'hello';
Str := Utf8Str;
P := PAnsiChar(Str);
P[1] := 'x'; // SIGSEGV in FPC, OK in Delphi
Writeln(Str); // writes hxllo in Delphi
Writeln(Utf8Str); // writes hello in Delphi
end.
The documentation doesn't tell anything about it:
https://www.freepascal.org/docs-html/ref/refsu12.html
If changing a string via a PChar is not allowed in FPC than the argument
with refcount is not really valid.
Another thing: if you do the assignment directly (UTF8String ->
AnsiString), you get a refcount of 1 (=a new copy of the string), but if
you do the assignment via a RawByteString (UTF8String -> RawByteString
-> AnsiString), you get a refcount of 3 (=the same string):
program AnsiUtf8;
{$ifdef fpc}{$mode delphi}{$else}{$apptype console}{$endif}
var
Utf8Str: UTF8String;
RawStr: RawByteString;
Str: AnsiString;
begin
DefaultSystemCodePage := 65001;
Utf8Str := Copy('hello', 1);
RawStr := Utf8Str;
Str := RawStr;
Writeln(PInteger(PByte(Str) - 8)^); // write refcount
end.
I've had my share for now fighting with people who rely on
implementation details (like this is one), so I'd rather not change
that unless Delphi does it too (and even then we may get complaints
that FPC is not backwards compatible in this respect).
It's funny to see that the holy mantra of the "implementation detail" is
used once to support a different behavior and the second time to fight it :)
See the attached patch.
Your patch will return an empty string if orgcp is different from both
cp and CP_NONE.
This is nonsense. I didn't touch the last else-part that is used when
orgcp is different from both cp and CP_NONE. You can easily check yourself:
program AnsiUtf8;
var
Utf8Str: UTF8String;
Str: AnsiString;
begin
DefaultSystemCodePage := 1250;
Utf8Str := 'hello';
Str := Utf8Str;
Writeln(Str);
end.
Ondrej
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel