Hi everyone,

I recently made a merge request that initally just fixed the incorrect memory alignment for __m128 and similar types, but doing so revealed a whole plethora of other bugs.  First, when I fixed it, __m128 etc were no longer recognised as a valid SIMD or aggregate type due to the wrong alignment field being checked at one point, and some tests with vectorcall revealed some bad code being generated in places.

This may have to be a long work in progress.  I've also found another bug:

program m128test;

function Test3(V1, V2: __m128d): __m128d; vectorcall;
begin
  Test3[0] := V1[0] + V2[0];
  Test3[1] := V1[1] + V2[1];
end;

begin
end.

This will raise Internal error 200410108 under -O2 when compiled under x86_64-win64.  It only occurs with __m128d, not __m128 or __m128i (although __m128i seems to have its own problems).  My merge request fixes the internal error, but produces bad code instead.  When using __m128 or __m128i instead, the following assembly language is produced under -O2:

.section .text.n_p$m128test_$$_test1$__m128$__m128$$__m128,"ax"
    .balign 16,0x90
.globl    P$M128TEST_$$_TEST1$__M128$__M128$$__M128
P$M128TEST_$$_TEST1$__M128$__M128$$__M128:
.seh_proc P$M128TEST_$$_TEST1$__M128$__M128$$__M128
    leaq    -40(%rsp),%rsp
.seh_stackalloc 40
.seh_endprologue
    movq    %rcx,%rax
    movq    %xmm1,(%rsp)
    movq    %xmm2,8(%rsp)
    movq    %xmm3,16(%rsp)
    movq    %xmm4,24(%rsp)
    movss    (%rsp),%xmm0
    addss    16(%rsp),%xmm0
    movss    %xmm0,(%rax)
    movss    4(%rsp),%xmm0
    addss    20(%rsp),%xmm0
    movss    %xmm0,4(%rax)
    leaq    40(%rsp),%rsp
    ret
.seh_endproc

The fact that the same code is produced under __m128i, which is meant to use integers, is worrying, but that aside, this code is clearly wrong (ignoring the fact that the parameters are being passed on the stack instead of through registers, and %rcx seems to refer to a hidden parameter that's a pointer to the result). -sr reveals that V1 is at (%rsp) and V2 is at 16(%rsp), but the first thing that happens is that their contents are overwritten with undefined values (the movq instructions).  If the operands were reversed, this would seem more logical.

Gareth aka. Kit

P.S. I started making this fix to aid with vectorisation development.


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to