True. However, the size of the die was too large to make it economical for
anything but server usage. (die size = $$$) Plus, the Pentium Pro's cache,
as you state, was not integrated into the core so much as it was slapped
into the die package. Therefore, it couldn't achieve the same benefits of
a huge bus width and low latency that true integrated cache (first on the
Celeron A of all things...) brought.
It was actually pretty close. The PPro clock for clock blew away the PII -
the Celeron A and PIII with integrated cache was of course a bit faster,
but not revolutionary. Those chips simple carried on where the PPro left
off. The P2 was in many respects a step backwards, especially for servers
back then. We talk about how the PPro was insanely expensive to
manufacture and ironically the move to integrated cache with newer gen
chips was not only for performance reasons, but for cost savings as well.
Yes, the PII without on-die or integrated cache suffered from an even
smaller bus width and frequency to cache than the on-die but not integrated
cache did. With process sizes as big as they were (450 and 350nm), it wasn't
economical to put large amounts of cache in the processor die. However, with
250nm and smaller processes, the die size was shrunk enough to make it
economically viable. If the die is reasonable, it is certainly a lot cheaper
to integrated it into the core than to make a processor 'package' a la Slot
1.
The PII was a definite step backwards with regard to the PPro. However, it
was necessary to keep processor costs down. As I'm sure you recall, the PPro
was marketed soley as a server chip. However, the PII was for desktops and
workstations, and therefore had to fit in price points less than that of the
PPro. With the gigantic die size of the PPro at that time, it simply wasn't
possible.
Ideally, Intel would have continued the PPro, and gave it a 350/250nm
process shrink and maintained it as the server line, but clearly they
didn't. :)
Greg