Re: [fpc-devel] TRegistry and Unicode

Yuriy Sydorov Thu, 07 Mar 2019 09:31:00 -0800

On 07.03.2019 18:38, Bart wrote:

On Wed, Mar 6, 2019 at 10:09 PM Yuriy Sydorov <j...@cp-lab.com> wrote:

If you declare a function result as utf8string instead of string (ansistring) 
then automatic conversion will be
performed when you assign the result of the function to a variable of type 
string (ansistring). You will gen a classic
1-byte per character string if your current encoding is 1-byte encoding.
I mentioned this earlier.


I know that, but you do not need to assign the functionresult to
another string to investigate it.
Stupid example:
program test;

function x: utf8string;
var
   u: unicodestring;
begin
   setlength(u,3);
   word(u[1]) := $E4; //my editor is UTF8 so therefore this workaround
instead of u := 'äëï';
   word(u[2]) := $EB;
   word(u[3]) := $EF;
   result := utf8encode(u); //äëï but now Utf8Encoded
end;

var
   u8: utf8string;
begin
   u8 := x;
   if byte(u8[1]) = $E4 then writeln('OK') else writeln('Fail');
end.

It prints Fail, where it would have printed OK if x would have returned string.

This a corner case, but it definitely is a regression nevertheless.

Of course if "u8" is utf8string, then then first char will be encoded as a 2-byte pair. But if you change "u8" to bejust "string" or "ansistring", then the first byte would contain "ä" if the current ansi code page supports it (eg cp1252).

It is perfectly backward compatible.

Yuriy.
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] TRegistry and Unicode

Reply via email to