Hi,
At 04:30 PM 2/11/2001 -0500, [EMAIL PROTECTED] wrote:
>One of the drawbacks of doing it
>by hand in assembler...too bad high-quality HLL compilers (i.e. ones
>capable of giving 80-90% of the performance of laboriously coded and
>hand-tuned ASM, for complex, data-nonlocal algorithms requiring lots
>of data prefetch) appear to be nigh-impossible to write for CISCs like
>the x86 family.
One of the biggest challenges is discovering the innermost workings
of the P4. The P4 documentation does not detail every penalty one can
run into. I create lots of test cases and time them, analyzing the
timing difference between two nearly identical code fragments can often
lead to insights into the P4 architecture.
Today's mystery: I've found a case where adding a NOP instruction
speeds up the code by 9%. Not just a small loop, the 9% speedup
affects the entire 2nd pass of the FFT! As of now I have no theories
or explanations. Time to write some more code fragments to figure
it out.....
Regards,
George
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers