Re: [fpc-devel] vmul commutative optimization?

Marco van de Voort Fri, 15 Nov 2019 04:08:37 -0800


Op 14/11/2019 om 01:14 schreef J. Gareth Moreton:

I guess that means testing with VS?
Testing with Visual Studio or even GCC under Windows is a good idea ifyou want to be sure how particular record types are transferred. Theexample given in that article has two fields of type __m128, eventhough it looks like only one of the four vector elements are usedinitially. Regardless, under the default Microsoft callingconvention, that would be passed by reference, just like a record oftwo Doubles. A (packed) record of two Singles would be passed byvalue in an integer register, just to cause trouble with conversions!

To be clear: I meant if 2 single 64-bit vectors are registered in XMMinstead of integer fields with vectorcall

It was more meant as a research point, I don't need it anymore. Afterrealizing that I either need autovectorizing or intrinsics I simplystarted doing a simple translation to assembler, a naive 1:1 translation(but then with complex as two singles in an XMM). Bit of fiddling todefine multiplying with j in xmm assembler (Doing NOT on one of bothsingles), but otherwise simple.

I got the first stage (the radix funtions for the radices that I use,4,5,10) and got things working, and both speed and instruction countdivided by 3. (not entirely 100% logical, since the asm version hasrelatively more complex instructions).

Under vectorcall, a record of two Singles would be treated as aHomogeneous Float Aggregate and pass the two fields in XMM0 and XMM1

Afaik FPC doesn't do that yet. It passed in an int register. Pity. as_m64 register it would have been nice for complex-with-singles.

, and the same thing happens with an unaligned record of two Doubles. If a record of two Doubles is aligned to a 16-byte boundary though, oris otherwise a union with a __m128 type (with the two Doubles aliasedto the lower and upper 64 bits respectively), then it can be passed inits entirety through XMM0.
Some things are a little bit messy and opaque with __m128 though, andjust making an aligned array of 4 Singles or 2 Doubles doesn't alwayswork - it needs to be typecast through __m128 in some way - but Ithink that's mostly because C++ wasn't really designed with alignmentin mind. In Free Pascal, you have to make a bit of a messy union toensure everything works; for example:

I already use that union copied from your patch, but then changed tosingles. But doesn't do much.


_______________________________________________
fpc-devel maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] vmul commutative optimization?

Reply via email to