Am 22.10.19 um 05:01 schrieb J. Gareth Moreton:


Bigger challenges would be optimising the modulus of a complex number:

   function cmod (z : complex): real; vectorcall;
     { module : r = |z| }
     begin
        with z do
          cmod := sqrt((re * re) + (im * im));
     end;

A perfect compiler with permission to use SSE3 (for haddpd) should generate the following (note that no stack frame is required):

mulpd    %xmm0, %xmm0 { Calculates "re * re" and "im * im" simultaneously }
haddpd    %xmm0, %xmm0 { Adds the above multiplications together (horizontal add) }
sqrtsd    %xmm0
ret

Currently, with vectorcall, the routine compiles into this:

leaq    -24(%rsp),%rsp
movdqa    %xmm0,(%rsp)
movq    %rsp,%rax
movsd    (%rax),%xmm1
mulsd    %xmm1,%xmm1
movsd    8(%rax),%xmm0
mulsd    %xmm0,%xmm0
addsd    %xmm1,%xmm0
sqrtsd    %xmm0,%xmm0
leaq    24(%rsp),%rsp
ret

And without vectorcall (or an unaligned record type):

leaq    -24(%rsp),%rsp
movq    %rcx,%rax
movq    (%rax),%rdx
movq    %rdx,(%rsp)
movq    8(%rax),%rax
movq    %rax,8(%rsp)
movq    %rsp,%rax
movsd    (%rax),%xmm1
mulsd    %xmm1,%xmm1
movsd    8(%rax),%xmm0
mulsd    %xmm0,%xmm0
addsd    %xmm1,%xmm0
sqrtsd    %xmm0,%xmm0
leaq    24(%rsp),%rsp
ret


With a few additions (the git patch is less than 500 lines) in the compiler I get (it is not ready for committing yet):

.section .text.n_p$program_$$_cmod$complex$$real,"ax"
        .balign 16,0x90
.globl  P$PROGRAM_$$_CMOD$COMPLEX$$REAL
        .type   P$PROGRAM_$$_CMOD$COMPLEX$$REAL,@function
P$PROGRAM_$$_CMOD$COMPLEX$$REAL:
.Lc2:
# Var $result located in register xmm0
# Var z located in register xmm0
# [test.pp]
# [20] begin
# [22] cmod := sqrt((re * re) + (im * im));
        mulsd   %xmm0,%xmm0
        mulsd   %xmm1,%xmm1
        addsd   %xmm0,%xmm1
        sqrtsd  %xmm1,%xmm0
# Var $result located in register xmm0
.Lc3:
# [23] end;
        ret
.Lc1:
.Le0:
.size P$PROGRAM_$$_CMOD$COMPLEX$$REAL, .Le0 - P$PROGRAM_$$_CMOD$COMPLEX$$REAL

It mainly keeps records in mm registers. I am not sure about the right approach yet. But to allocate one register to each field of suitable records seems to be a reasonable approach.
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to