Op 14/11/2019 om 01:14 schreef J. Gareth Moreton:

I guess that means testing with VS?

Testing with Visual Studio or even GCC under Windows is a good idea if you want to be sure how particular record types are transferred.  The example given in that article has two fields of type __m128, even though it looks like only one of the four vector elements are used initially.  Regardless, under the default Microsoft calling convention, that would be passed by reference, just like a record of two Doubles.  A (packed) record of two Singles would be passed by value in an integer register, just to cause trouble with conversions!

To be clear: I meant if  2 single 64-bit vectors are registered in XMM instead of integer fields with vectorcall

It was more meant as a research point, I don't need it anymore. After realizing that I either need autovectorizing or intrinsics I simply started doing a simple translation to assembler, a naive 1:1 translation (but then with complex as two singles in an XMM). Bit of fiddling to define multiplying with j in xmm assembler (Doing NOT on one of both singles), but otherwise simple.

I got the first stage (the radix funtions for the radices that I use, 4,5,10) and got things working, and both speed and instruction count divided by 3.  (not entirely 100% logical, since the asm version has relatively more complex instructions).

Under vectorcall, a record of two Singles would be treated as a Homogeneous Float Aggregate and pass the two fields in XMM0 and XMM1

Afaik FPC doesn't do that yet. It passed in an int  register. Pity. as _m64 register it would have been nice for complex-with-singles.

, and the same thing happens with an unaligned record of two Doubles.  If a record of two Doubles is aligned to a 16-byte boundary though, or is otherwise a union with a __m128 type (with the two Doubles aliased to the lower and upper 64 bits respectively), then it can be passed in its entirety through XMM0.

Some things are a little bit messy and opaque with __m128 though, and just making an aligned array of 4 Singles or 2 Doubles doesn't always work - it needs to be typecast through __m128 in some way - but I think that's mostly because C++ wasn't really designed with alignment in mind.  In Free Pascal, you have to make a bit of a messy union to ensure everything works; for example:

I already use that union copied from your patch, but then changed to singles. But doesn't do much.

_______________________________________________
fpc-devel maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to