On 27 Sep 99, at 4:22, Paul Leyland wrote:
> Actually, we at Microsoft Research in Cambridge have seen similar effects
> when compiling and running FFTW code. Our discovery is that the alignment
> of FP data values is critical. Get it wrong, and performance can plummet.
> Unless you set the alignment explicitly, it will be wrong approximately half
> the time.
So, is a future release of MSVC++ going to include an option to
optimize alignment of FP data values, at the expense of minimizing
storage by packing data values as tight as possible?
I think this mainly applies to quadword operands (doubles in C) which
should be aligned on a 8-byte boundary, so that one memory bus cycle
is sufficient. This strategy also avoids operands spanning cache line
boundaries, which would likely have a serious effect on performance
by effectively halving the associativeness of the L1 data cache.
Alignment on 4-byte boundaries is quite sufficient for C floats. Ten-
byte reals (direct copies from FPU registers) are a problem, you are
always going to need two memory bus cycles since you can't fit an 80-
bit operand on a 64-bit bus. However, whether you pack a ten-byte
real array contiguously (with no wasted space), or align elements on
16-byte boundaries (with lots of wasted space, but no cache line
conflicts) could have a significant effect on performance - which
might work in different directions in different applications.
Regards
Brian Beesley
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers