On 9 Apr 00, at 20:51, Stefan Struiker wrote:

> Working on nearly the same exponent, 9.7 million, I notice that
> our PIII 500 takes only 119.5 million clocks per iteration, whereas
> our 733EB requires 124.8 million to do the same.

The PIII 500/100 MHz FSB system equates to a PIII 666/133 MHz FSB 
system, i.e. they _should_ take the same number of clocks. With a 
higher multiplier, you will be waiting for extra clocks to access 
data when it isn't in one of the caches. Even a PIII 550 in the same 
MB as the PIII 500 would take _some_ extra clocks, though not enough 
to cancel out the extra CPU speed. Basically the point is that, with 
a relatively lower instruction & operand fetch rate relatively to the 
execution unit performance, the CPU is going to spend more of its 
time waiting for the pipeline to furnish new instructions and/or data 
values to work on.

The smaller L2 cache probably doesn't help, even though it is running 
faster. There is still a wastage of memory bandwidth whenever there 
is a L2 cache miss, which is going to happen more often with a 
smaller cache. However, this may not be too serious - 128K L2 cache 
Celerons perform quite well, and L2 cacheless Celerons are nowhere 
near as bad as you might suspect!

The chipset has a big effect, too. All systems using 133 MHz FSB run 
the memory asynchronously to the system bus. One would suspect that 
the hardware required to resynchronize might have a bad effect on the 
throughput. The way my PIII 533B / TMC TI6NBF+ (VIA chipset) system 
performs, using 133 MHz FSB & memory, leads me to suspect that 
there's a 100 MHz bottleneck somewhere in the memory subsystem!
> 
> The 733 has 256K full-speed on-chip cache and runs 256MB of RDRAM
> (RAMBUS), 133MHz bus and was not so cheap.

Wow, 256 MB RDRAM certainly _isn't_ cheap, I could buy a whole PIII 
500 system with the money I'd save if I bought 256 MB SDRAM instead!

Seriously, though, systems with more physical memory have bigger page 
tables, which can't help causing more system faults. Grossly 
excessive amounts of physical memory hurt performance (a bit) rather 
than help it. Depending on the OS you're running, how the program 
fits into the memory available can make a big difference too. Using 
Win 9x or NT WS 4.0, you can get up to 5% variation in iteration time 
_on the exact same hardware, on the exact same exponent, without even 
rebooting the system_. Starting Prime95 using my ReCache program 
helps the OS to load Prime95 efficiently, especially on systems with 
more than adequate physical memory. Win 2000 seems to manage well 
enough without ReCache. linux systems don't seem to be bothered, 
their run speed is consistent to within 1%, but adding extra memory 
does still cause a modest slowdown.
> 
> An Athlon of my acquaintance does all of this in 114.6 million clocks, but
> that's another story, because the Athlon is supposed to decode more
> instructions per cycle, and generally ace it in the floating-point
> department..

I have a Athlon 650 and yes, it flies - it's turning in performance 
which equates to a PIII 733 extrapolated from George's benchmarks.

There are other factors than the "improved" CPU: the L1 cache is 32 
KB instead of 16 KB & I have a sneaky suspicion that the LL testing 
code manages to run with at least the vast majority of instruction 
fetches being stored in L1, this reduces the cycles lost to L1 cache 
miss. And the internal bus is 128 bits wide instead of 64 bits, so 
you get more data shifted internally between CPU/L1 cache and (512 
KB) L2 cache. This more than compensates for the lower L2 cache 
speed.
> 
> Can anyone explain the Pentium discrepancy?

Not entirely, but I believe the points I've outlined above go some 
way.

Just out of interest, which MB are you using for the system with 
RDRAM in it, and what's the spec of the RDRAM itself? 

Regards
Brian Beesley
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to