On 28.04.2019 12:35, Jonas Maebe wrote:
On 28/04/2019 09:55, Ondrej Pokorny wrote:

It's probably what Delphi does as well. The result is that the refcount of a string after such an assignment is currently always one.

Thanks for the answer. Yes, Delphi does the same. But strings have copy-on-write so the refcount value doesn't really matter - unless you change the resulting string via a PChar. Btw. changing a PAnsiChar/PChar results in a SIGSEGV in FPC but is OK in Delphi. Could you explain this?

program AnsiUtf8;
{$ifdef fpc}{$mode delphi}{$else}{$apptype console}{$endif}
var
  Utf8Str: UTF8String;
  Str: AnsiString;
  P: PAnsiChar;
begin
  DefaultSystemCodePage := 65001;
  Utf8Str := 'hello';
  Str := Utf8Str;
  P := PAnsiChar(Str);
  P[1] := 'x'; // SIGSEGV in FPC, OK in Delphi
  Writeln(Str);     // writes hxllo in Delphi
  Writeln(Utf8Str); // writes hello in Delphi
end.

The documentation doesn't tell anything about it: https://www.freepascal.org/docs-html/ref/refsu12.html

If changing a string via a PChar is not allowed in FPC than the argument with refcount is not really valid.

Another thing: if you do the assignment directly (UTF8String -> AnsiString), you get a refcount of 1 (=a new copy of the string), but if you do the assignment via a RawByteString (UTF8String -> RawByteString -> AnsiString), you get a refcount of 3 (=the same string):

program AnsiUtf8;
{$ifdef fpc}{$mode delphi}{$else}{$apptype console}{$endif}
var
  Utf8Str: UTF8String;
  RawStr: RawByteString;
  Str: AnsiString;
begin
  DefaultSystemCodePage := 65001;
  Utf8Str := Copy('hello', 1);
  RawStr := Utf8Str;
  Str := RawStr;
  Writeln(PInteger(PByte(Str) - 8)^); // write refcount
end.


I've had my share for now fighting with people who rely on implementation details (like this is one), so I'd rather not change that unless Delphi does it too (and even then we may get complaints that FPC is not backwards compatible in this respect).

It's funny to see that the holy mantra of the "implementation detail" is used once to support a different behavior and the second time to fight it :)


See the attached patch.

Your patch will return an empty string if orgcp is different from both cp and CP_NONE.

This is nonsense. I didn't touch the last else-part that is used when orgcp is different from both cp and CP_NONE. You can easily check yourself:

program AnsiUtf8;
var
  Utf8Str: UTF8String;
  Str: AnsiString;
begin
  DefaultSystemCodePage := 1250;
  Utf8Str := 'hello';
  Str := Utf8Str;
  Writeln(Str);
end.

Ondrej

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to