On 21/10/2019 20:00, Florian Klämpfl wrote:
What's the problem with

{$codealign RECORDMIN=16}
type complex = record
                      re : real;
                      im : real;


Hi Florian,

I tried that, but that puts each individual field on a 16-byte boundary (and the documentation for RECORDMIN implies that this is correct behaviour), which is why only an array element works. This occurs even if "packed record" is used.  The assembly language confirms this:

    movsd    U_$P$COMPLEX_$$_Z(%rip),%xmm0
    addsd    U_$P$COMPLEX_$$_X(%rip),%xmm0
    movsd    %xmm0,40(%rsp)
    movsd    U_$P$COMPLEX_$$_Z+16(%rip),%xmm0
    addsd    U_$P$COMPLEX_$$_X+16(%rip),%xmm0
    movsd    %xmm0,56(%rsp)
    movq    40(%rsp),%rax
    movq    %rax,U_$P$COMPLEX_$$_Z(%rip)
    movq    48(%rsp),%rax
    movq    %rax,U_$P$COMPLEX_$$_Z+8(%rip)
    movq    56(%rsp),%rax
    movq    %rax,U_$P$COMPLEX_$$_Z+16(%rip)

The section with 48(%rsp) seems to relate to those 8 filler bytes and I do question its validity, if not its necessity, since the rest of the subroutine implies that the data at 48(%rsp) is undefined.

In the compiler that sets up XMM parameters (compiler/x86_64/cpupara.pas, line 945), the following code dictates whether a packed vector register is used or two separate registers:

                    if Assigned(parentdef) and ((parentdef.aggregatealignment mod 16) = 0) and ((byte_offset mod parentdef.aggregatealignment) <> 0) then
                      { Aligned vector of type double }

(parentdef is the "complex" type in this case, while the regular def is one of the real-type fields).

"byte_offset" for 're' is 0 and for 'im' is 8.  The problem is that parentdef.aggregatealignment, by default, is equal to 8, which indicates that, as far as the compiler is concerned, it is only guaranteed to be aligned to an 8-byte boundary, not 16 (the "mod" expression checks this).  When using the RECORDMIN=16 construct above, the aggregatealignment does increase to 16, but "byte_offset" for 'im' also becomes 16.  At the moment, there's no way to configure a record type to have an aggregate alignment of 16 and a field alignment of 8.

Regarding the stack being aligned to 16-byte boundaries, while this can be guaranteed for local variables and formal parameters, the actual parameters passed into the function may not be aligned (e.g. when deferencing a pointer on the heap after calling, say, "New(complex);"), hence why the compiler can only go by the 8-byte aggregate alignment.

Gareth aka. Kit

This email has been checked for viruses by Avast antivirus software.

fpc-devel maillist  -  fpc-devel@lists.freepascal.org

Reply via email to