> But Prime95 (or variants) is not typical in that it stresses CPU and L2
> cache performance. It isn't clear to me how the trade-off between the
> Celeron 333A with its 128K L2 running at CPU speed compares to the PII-333
> with the 512K L2 running at 1/2 the CPU speed when doing LL tests on
> exponents in the 6000000+ range. If Prime95 can shuffle info into and out
> of the 512K L2 of the PII but can't fit it all into the 128K L2 of the
> Celeron, and consequently has to read and write frequently to the slower
> SDRAM with the Celeron, then couldn't the PII be significantly faster for
> Prime95? I think real world tests are needed to determine the outcome. Of
> course, my understanding of how Prime95 operates is rudimentary and what I
> said above might be hogwash.
No it isn't hogwash.
But, if you're LL testing using a 256K FFT size, your work vectors
come to 4 MBytes. I reckon, the way the program is structured,
you don't actually save many L2 cache misses as a result of
having only 128K L2 cache instead of 512K L2 cache.
Many people are now running 384K FFTs... which makes the
inadequacy of even 512K L2 cache even more marked...
The fact that the Mendocino chip (Celeron 300A, Celeron 333) runs
its L2 cache at core speed, instead of half core speed, like PIIs,
suggests that transplanting a C300A into a PII-300 system, or a
C333 into a PII-333, making NO other changes, might actually
*improve* the performance of Prime95, though other applications
may benchmark a few percent slower.
I'd be VERY interested to hear of anyone running Prime95 on a
Xeon - the Xeon has 512K, 1M or 2M of L2 cache running at core
speed (at painful, extortionate and positively obscene prices,
respectively) - benchmark comparisons with PII systems at the
same processor/bus speed would be interesting!
Regards
Brian Beesley