Op 12/11/2019 om 16:08 schreef J. Gareth Moreton:
It's true.  With VMULSS, only the first parameter (third parameter under Intel notation) can be an address (source: Intel(R) 64 and IA-32 Architectures Software Development Manual, Volume 2B, Page 4-154).

I'll see if I can work in that optimisation for the commutative operations (+ and *) at some point from the node side.

Thanks.

Another tidbit I noticed while playing with  (elements of) the complex patch is that if I set the elementsize to double (re:double;im:double) that with vectorcall loads all data into registers.

However if I make it single, (iow the tcomplex is 8-byte), the records are loaded into integer registers, and the compiler stores them to the stack and then reloads them.

This matters less for me since it won't vectorize anyway (see inline and philosophy thread) I'll change this routine to assembler I think, accepting a pointer and load and store from that thread.

_______________________________________________
fpc-devel maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to