>The "r->x" alternative results in "vector" decoding on amdfam10. This is 
>AMD-speak for microcoded instructions, and AMD optimization manual strongly 
>recommends avoiding them. I have CC'd Ganesh, maybe he >can provide more 
>relevant data on the performance impact.

Thanks Uros!

Yes, the AMD SWOG recommends precisely what Uros mentions.
<snip from SWOG for BD>
When moving data from a GPR to an XMM register, use separate store and load 
instructions to move
the data first from the source register to a temporary location in memory and 
then from memory into
the destination register
</snip>

This is listed as an optimization too. This holds good for all amdfam10 and BD  
family processors. 
I have to dig through the performance numbers will try to get them.

Regards
Ganesh

Reply via email to