Actually, while I/O is the classic example of why processor speed is not everything, you don't have to move that far beyond the processor itself to show this. Note that the various types of servers have different size and structures of L1, L2, L3, caches and memory interfaces. Also note that the memory latency and bandwidth varies from machine to machine. Finally note that the faster the processor the higher the latencies are in terms of number of cycles. Benchmarks are sensitive to this, but not uniformly. For example TPC-C got a big boost from 8MB L2 caches, but SPECint is almost totally insensitive to L2 size. Real work has a tendency to show even more variability. That is there are more daemons,etc. chewing up cache space causing more misses, and the code is not as extensively tuned. This tide will float all boats (even if you run tpc-c for a living don't expect the tuned rates if you also are running security, monitoring, accounting,etc.). However, the differences in memory hierarchy will cause some machines to be impacted more than others. There is really no way except experience to tell how an application treats the memory hierarchy in this regard. As a result defining relative capacity with a single metric is not possible, and any benchmark or metric that is suggested for this will not match the "real world", which will exhibit more dynamic variability both with time and workload.
On thing is for sure though: If a cache is blown, during the miss time the processor is 100% busy as measured by normal means, and the throughput is zero. You can see if this is happening to an significant extent by plotting throughput v processor utilization. (Throughput can come from the application or be estimated by network data rate). Draw linear, "power" and logaritmic trends through the data. If the best trend is linear and the line intercepts the vertical axis near or below the origin then there is little or no "saturation" and little pressure on the memory hierarchy. If the best trend is logarithmic then there is heavy saturation and chances are that the workload is blowing the caches as load is applied. If the power curve fits the best the answer is somewhere in the middle. As the exponent of the power curve approaches 1 the workload is exhibiting less saturation because the trend becomes linear. The more saturation exhibited the higher the utilization at which zLinux consolidation is viable. This is because the raw conversion factor is typically better for more saturated workloads. My point is there is no such thing as a metric which defines the relative capacity of various servers because of the differences in their memory hierarchies. Processor speed is an indicator, but it is insufficient to do any real comparisons. The processor speeds are what they are. You can measure them with MHz, BOGOMIPS, SPECint, or hello world and a stop watch. You still will not understand the relative capacity for any particular piece of work once you do. This is because work always causes the CPU speed differences to be mitigated by other bottlenecks such as waiting for memory or I/O and the impact varies with workload, time, and machine architecture. The rules developed early on were based on some intuitive understanding that heavy computational work is less impacted by non processor bottlenecks, whereas transactional workloads with lots of locking and data sharing are more impacted. Since most other machines come from a heritage which emphasizes the former and zSeries heritage is firmly in the latter, the intuitive choices are generally correct and don't change with normal evolutionary changes in processor speed. Joe Temple [EMAIL PROTECTED] 845-435-6301 295/6301 cell 914-706-5211 home 845-338-8794 Alan Altmark/Endicott/ To: [EMAIL PROTECTED] [EMAIL PROTECTED] cc: Sent by: Linux on Subject: Re: Perpetuating Myths about the zSeries 390 Port <[EMAIL PROTECTED] IST.EDU> 10/29/2003 02:00 PM Please respond to Linux on 390 Port On Wednesday, 10/29/2003 at 10:08 PST, Jim Sibley <[EMAIL PROTECTED]> wrote: > I disagree about your "I/O wait is independent of > processor speed" because that is only one component of > reponse time. The faster processor can initiate I/O's > faster and can service interrupts faster, thus > reducing internal queue wait times. The faster processor can start more I/Os per second than a slower processor. A faster I/O processor can move data off the channel into memory faster (and vice versa). The speed of the channel itself and the device does not change. Yes, you CAN change it, but it is a function (and price) that is independent of CPU selection. > As far as transaction transit time, that's a > combination of cp processor speed, memory speed > (whether or not the transaction or its data is paged), > and I/O response. YES! ABSOLUTELY! (With I/O response being an amalgam of channel speed, control unit speed and caching, device speed and caching, and contention.) > Then the question still is: Are there workloads that > we have measured in the past and written off as > inappropriate for a zSeries that is now enabled with > the lastest zSeries technology. I would only ask that you complete the picture by factoring in costs. Changes in prices of energy, people, real estate, machines, etc., can bring on board workloads that were previously out of reach. This is the core of the TCO argument. Are you able to achieve acceptable results at a price you're willing to pay? And is that total cost (not just acquisition costs of s/w & h/w) to deploy the workload less than you pay now? It's certainly easy to focus on just the technology, which can be used to make an initial cut (nope, probably still shouldn't use the mainframe for your next feature-length animated film) but let's not forget the other parts of the TCO equation. Alan Altmark Sr. Software Engineer IBM z/VM Development