Hi,

At 04:30 PM 2/11/2001 -0500, [EMAIL PROTECTED] wrote:
>One of the drawbacks of doing it
>by hand in assembler...too bad high-quality HLL compilers (i.e. ones
>capable of giving 80-90% of the performance of laboriously coded and
>hand-tuned ASM, for complex, data-nonlocal algorithms requiring lots
>of data prefetch) appear to be nigh-impossible to write for CISCs like
>the x86 family.

One of the biggest challenges is discovering the innermost workings
of the P4.  The P4 documentation does not detail every penalty one can
run into.  I create lots of test cases and time them, analyzing the
timing difference between two nearly identical code fragments can often
lead to insights into the P4 architecture.

Today's mystery:  I've found a case where adding a NOP instruction
speeds up the code by 9%.  Not just a small loop, the 9% speedup
affects the entire 2nd pass of the FFT!  As of now I have no theories
or explanations.  Time to write some more code fragments to figure
it out.....

Regards,
George


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to