On 27 Sep 99, at 4:22, Paul Leyland wrote:

> Actually, we at Microsoft Research in Cambridge have seen similar effects
> when compiling and running FFTW code.  Our discovery is that the alignment
> of FP data values is critical.  Get it wrong, and performance can plummet.
> Unless you set the alignment explicitly, it will be wrong approximately half
> the time.

So, is a future release of MSVC++ going to include an option to 
optimize alignment of FP data values, at the expense of minimizing 
storage by packing data values as tight as possible?

I think this mainly applies to quadword operands (doubles in C) which 
should be aligned on a 8-byte boundary, so that one memory bus cycle 
is sufficient. This strategy also avoids operands spanning cache line 
boundaries, which would likely have a serious effect on performance 
by effectively halving the associativeness of the L1 data cache.

Alignment on 4-byte boundaries is quite sufficient for C floats. Ten-
byte reals (direct copies from FPU registers) are a problem, you are 
always going to need two memory bus cycles since you can't fit an 80-
bit operand on a 64-bit bus. However, whether you pack a ten-byte 
real array contiguously (with no wasted space), or align elements on 
16-byte boundaries (with lots of wasted space, but no cache line 
conflicts) could have a significant effect on performance - which 
might work in different directions in different applications.




Regards
Brian Beesley
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to