Hi,

At 11:41 AM 10/14/2001 +0000, [EMAIL PROTECTED] wrote:
> > According to latest benchmarks (http://www.mersenne.org/bench.htm),
> > AthlonXP seems to be slower than the Thunderbird.  Does anybody have a
> > technical explanation ?

The technical explanation is that I only have one XP benchmark.  I'm sure
that as I get more benchmarks a better tuned system will report better timings.

>The Athlon XP lies about its speed. Remember the old Cyrix trick?
>Well AMD have gone for the same - pick a benchmark that suits
>you, then claim your chip is "1800+" if it runs _that_ benchmark a
>wee bit faster than the opposition's chip running at 1800 MHz.

I'll try to post the real processor speed on my page, though it will be a
bit of a pain keeping it all straight.  I understand why AMD did this, at
least they have been relatively honest (that is, conservative) in choosing
their "equivalent P4 cpu speed" number.

>That seems to have been the case for some months now; it's
>SSE2 which makes the difference. A P4 running Prime95 is well
>over twice as fast as a T'bird running at the same clock speed.

The P4 has two advantages.  One is memory bandwidth.  The other
is SSE2.  Interestingly, the theoretical floating point throughput of the P4
is identical to the Athlon (one multiply and one add per clock) and the
Athlon even has lower latencies.

So why is SSE2 an advantage?  I theorize that their are 3 reasons.
1) It relieves the register pressure.  With SSE2 there are 16 floating point
values in registers instead of 8.  Obviously it is much easier to schedule
operations that are not dependent on previous operations with twice
as many data values.
2) The directly addressable registers are easier for the chip to keep track
of than the standard x86 FPU register stack.  I'm guessing there is some
limit as to how much internal register renaming can hide this problem in 
the Athlon.
3) Once an SSE2 instruction is ready, it starts two different FPU operations
on two consecutive clock cycles.  In other words, the CPU only needs to find
a new instruction to execute every other clock cycle to keep the FPU units
busy.  This is of course heavily related to reason #1.

When AMD finally implements SSE2 it should be a screamer.  The lower
latencies could be a big plus.

Regards,
George

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to