Alan wrote: "Of course, OSA and FCP QDIO (DMA) changes that picture a bit since all of a sudden the amount of data moving in/out is proportional to the CPU's ability to process the queues. Or is it? What if I have two CPUs operating a single DMA queue? Three CPUs? Gaaack!"
What about a 64 bit Linux DB2 server with several databases of several hundred gigabytes accepting queries from numerous Linux guests over hipersockets on a TREX? (31 bit Linux is limited by memory size in this scenario, not CP or I/O speed to the databases - though I/O speed to the swap devices is!). "I keep trying to say that the speed of the processor is not the measure. It is the throughput of the workload that you are running that is important. If your application is I/O bound, faster processors just mean you have more free time to wait for the I/O to complete. On the other hand, if you have more than one process/virtual machine/lpar, it means you can get more work done while waiting for the I/O to complete. (Assumption: I/O wait is independent of processor speed.)" I disagree about your "I/O wait is independent of processor speed" because that is only one component of reponse time. The faster processor can initiate I/O's faster and can service interrupts faster, thus reducing internal queue wait times. (And you put Linux on a 3390-9 image, that's what you're going to get - a lot of queue wait time on your databases!). As far as transaction transit time, that's a combination of cp processor speed, memory speed (whether or not the transaction or its data is paged), and I/O response. The faster CPU reduces the cp part of the equation, the memory speed is reduced by increasing the amount of memory (64 bit addressibility) and the CP memory speed, and I/O speed is improved by faster I/O devices + the spread of I/O over the channels and devices (i.e., tuninig 101). So increasing processor speed has a significant effect on transaction time. As to transaction volume or rate, that also is a function of cp speed, the number of cp's, the memory speed, and the I/O rates. However, what tends to happen in a shop is that they put something like a shark in, with its 16 "logical control units" and concentrate their data in a single LCU on a 3390-9 image! So who cares how fast your CPU is - its waiting on the LCU, which is REALLY a physical CU - 1 raid array and a 3390-9 image is behind the same single subchannel as a 3390-3 image in Linux (unless you use Thoss's trick with VM managing the PAV's). Et voila, an I/O bottleneck and a faster CP does in fact wait faster (or slower?)! So, lets assume that the I/O subsystem, the CP's, and memory are tuned properly for the workload rather than imposing a supposed workload on the model. (I know that there is no "ideal" workload - what one has to deal with is the workload of the people trying to run it on a given OS with given hardware). Then the question still is: Are there workloads that we have measured in the past and written off as inappropriate for a zSeries that is now enabled with the lastest zSeries technology. Some questions come to mind: - Has the CP speed of the TREX (2084) enabled applications that were not acceptable on a previous machine? - Have the latest improvements in Linux 2.4.19 and 2.4.21 and the IBM drivers enabled applications that were not acceptable on a previous release? - Has the latest hipersockets implementations reduced CPU time and thus reduced "I/O wait" on the hipersockets? - Has the greater backend bandwidth of the later zSeries (2 Gbyte vs 1 Gbyte) reduced the "I/O wait" to memory and the devices? - Has a properly tuned Total Storage Server 800 (shark) with FICON reduced the "I/O wait" to the devides? - Have the improvements to Java, DB2, webshpere, etc, or has the processor speed of the TREX enabled more applications? Or, as Alan so rightly points out, "How many transactions per second can I handle" and "At what cost?" (Reducing TCO is only acceptable as a measure if I can get as much or more work done as fast or faster). By the by, as to Bogomip repeatability, as in any other measure, if there is a competing load, then you expect the numbers to vary, sometimes greatly. In a controlled lab, the bogomip number is repeatable and consistent between single processors. For example: environment: 4 shared CP on Trex 16 CP processor, with 5 active LPARS. The HMC shows an average of 26% total CP utilization. 3 ipls in a LPAR: 2398.61 bogomips 2398.61 " 2398.61 " 3 ipls under VM: 2398.61 bogomips 2398.61 " 2398.61 " 0% error, 100% repeatably! The 5-10% error I mentioned in an earlier note is with a heavier load on the LPARs. The error under a loaded VM for bogomips can be as much as a factor or 9 or more! ===== Jim Sibley Implementor of Linux on zSeries in the beautiful Silicon Valley "Computer are useless.They can only give answers." Pablo Picasso __________________________________ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/
