bearophile wrote:
Robert Jacques:

Yes, but the unaligned version is slower, even for aligned data.

This is true today, but in future it may become a little less true, thanks to 
improvements in the CPUs.

The problem is that difference today is so extreme. On core2:
 movaps [mem128], xmm0; // aligned,   1 micro-op
 movups [mem128], xmm0; // unaligned, 9 micro-ops, even on aligned data!
In practice it's about an 8X speed difference!

On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
On i7, movups on aligned data is the same speed as movaps. It's still slower if it's an unaligned access.

It all depends on how important you think performance on Core2 and earlier Intel processors is.

Reply via email to