Hi,

At 06:50 PM 1/6/00 EST, [EMAIL PROTECTED] wrote:
>Spike Jones wrote:
>
>>Processor gurus, please:  using the equivalence that is suggested
>>by the primenet status page [86.6 P90 CPU yr/day = 1042 GFlops]
>>I calculate that a floating point operation must be about 3 CPU cycles.
>
>Indeed, I calculate ~0.4 FLOP/cycle, which at first glance seems about a
>factor of 2 too slow, even on the humble P90 (which should, assuming no
>cache misses, be able to dispatch one FADD per cycle and (I believe - x86
>experts, please correct me if I'm wrong) one FMUL every other cycle, for
>a peak throughput of 1.5 FLOP/cycle.

The humble P90 can only do one FADD *OR* FMUL per cycle.  Thus, maximum
throughput is 1.0 FLOP/cycle.  Worse yet a floating point load takes one
clock and a store takes two clocks.  With only 8 registers there are a
lot of loads and stores.  The PPro, P-II, P-III, and Celeron have
a better architecture that allows loads and stores to run in parallel
with the FADDs and FMULs.  This makes it easier to approach the 1.0
FLOP/cycle theoretical maximum.

Regards,
George

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to