I need some help getting to the root of a problem with incorrect results on Apple hardware (M1, aarch64) for the function UTF8LengthFast in lazutf8.

On MacOS, when given a string containing one or more UTF8 characters, UTF8LengthFast returns wildly incorrect results. On Fedora, the function returns the correct answer.

On Apple, I'm using fpc 3.3.1, and Lazarus is 2.2.0RC3. On Fedora, Lazarus is 2.0.12-2, and fpc is 3.2.2-1.

The following small program demonstrates the problem here.

% cat utf8len.pas

program utf8len;

{$mode objfpc}{$H+}
{$CODEPAGE UTF8}

uses SysUtils, lazutf8;

const
  s =  '€';
var
  n: PtrInt;
begin
  n := UTF8LengthFast(s);
  writeln('Len='+inttostr(n));
end.

% file utf8len.pas
utf8len.pas: Unicode text, UTF-8 text

To compile this, on MacOS I use this:

% ~/fpc3.3.1/bin/fpc -Sh -Cro -O3 -XX -vewbq -FU. -Fu/usr/local/share/lazarus/components/lazutils/lib/aarch64-darwin utf8len.pas

On Fedora, with this:

$ /usr/bin/fpc -Sh -Cro -O3 -XX -vewbq -FU. -Fu/usr/lib64/lazarus/components/lazutils utf8len.pas

Then run it:

On MacOS:

% ./utf8len
Len=-100663283

On Fedora:

$ ./utf8len
Len=1

On MacOS, I built fpc from source, compiling 3.3.1 with version 3.2.2. I then compiled Lazarus using fpc 3.3.1.

Because I built fpc and then Lazarus, I'm considering the possibility I introduced an error or a bug somewhere, so I want to eliminate that possibility if possible by asking if anyone else can reproduce this problem?

I have traced through the code using a debugger on both platforms. The same path through the function UTF8LengthFast is followed, but the final loop involving boolean shifting of bytes produces different results. I don't understand well enough the algorithm that the function uses to easily see what's going on.



--
_______________________________________________
lazarus mailing list
lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Reply via email to