It's true. With VMULSS, only the first parameter (third parameter under
Intel notation) can be an address (source: Intel(R) 64 and IA-32
Architectures Software Development Manual, Volume 2B, Page 4-154).
I'll see if I can work in that optimisation for the commutative
operations (+ and *) at some point from the node side.
Gareth aka. Kit
On 12/11/2019 12:22, Marco van de Voort wrote:
I compiled some bits with avx, and noticed that when you do
asingle:=someconstant*othersingle;
then that generates something like
vmovss TC_$FFTS_$$_C31(%rip),%xmm2
vmulss %xmm0,%xmm2,%xmm0
while if you do
asingle:=othersingle*someconstant;
it generates
vmulss TC_$FFTS_$$_C32(%rip),%xmm2,%xmm2
I assume the reason is that only the first param can be an address,
and the second a register. But the compiler isn't smart enough to
exchange them.
_______________________________________________
fpc-devel maillist - [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist - [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel