On 7/21/16, Santiago A. <s...@ciberpiula.net> wrote:

> I've come across this issue: When I concatenate two strings in UTF8 they
> are converted to ansi (Win-1252) .

You have declared all string variables as plain "string", which is the
same as AnsiString(CP_ACP). So all string variables have the encoding
of your active codepage.

Declare Utf8StrA and related as Utf8String.
In DisplayBytes do not use "String" as parametertype, since this will
again automatically convert things.
The AnsiToUtf8 is not necessary anymore if done this way:

procedure DisplayBytes(S:RawByteString);
var
  i:Integer;
begin
  Write('  ');
  for i:=1 to length(s) do
    write(ord(s[i]),' ');
  writeln;
end;

//-----------------------------------
// body
//-----------------------------------
var
  AnsiStrA:string;
  AnsiStrB:string;
  Utf8StrA: utf8string;
  Utf8StrB:utf8string;
  Utf8StrConcat:utf8string;
begin
  AnsiStrA:=' ';
  AnsiStrA[1]:=#243; // o acute win-1252
  AnsiStrB:='A';

  Write('AnsiStrA: ');DisplayBytes(AnsiStrA); // 243
  Write('AnsiStrB: ');DisplayBytes(AnsiStrB); // 65


  Utf8StrA:=(AnsiStrA); // 195 179
  Utf8StrB:=(AnsiStrB); // 65

  writeln;
  Write('Utf8StrA: ');DisplayBytes(Utf8StrA); // 195 179
  Write('Utf8StrB: ');DisplayBytes(Utf8StrB); // 65

  Write('Utf8StrA+Utf8StrB: ');DisplayBytes(Utf8StrA+Utf8StrB);

  writeln;
  Write('Utf8StrA again: ');DisplayBytes(Utf8StrA); // 195 179
  Write('Utf8StrB again: ');DisplayBytes(Utf8StrB); // 65


  Utf8StrConcat:=Utf8StrA+Utf8StrB;
  writeln;
  Write('Utf8StrConcat: ');DisplayBytes(Utf8StrConcat);
end.

Bart
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to