Alan wrote:

"Of course, OSA and FCP QDIO (DMA) changes that
picture a bit since all
of
a sudden the amount of data moving in/out is
proportional to the CPU's
ability to process the queues.  Or is it?  What if I
have two CPUs
operating a single DMA queue?  Three CPUs?  Gaaack!"

What about a 64 bit Linux DB2 server with several
databases of several hundred gigabytes accepting
queries from numerous Linux guests over hipersockets
on a TREX? (31 bit Linux is limited by memory size in
this scenario, not CP or I/O speed to the databases -
though I/O speed to the swap devices is!).

"I keep trying to say that the speed of the processor
is not the
measure.
It is the throughput of the workload that you are
running that is
important.  If your application is I/O bound, faster
processors just
mean
you have more free time to wait for the I/O to
complete.  On the other
hand, if you have more than one process/virtual
machine/lpar, it means
you
can get more work done while waiting for the I/O to
complete.
(Assumption:
 I/O wait is independent of processor speed.)"

I disagree about your "I/O wait is independent of
processor speed" because that is only one component of
reponse time. The faster processor can initiate I/O's
faster and can service interrupts faster, thus
reducing internal queue wait times.

(And you put Linux on a 3390-9 image, that's what
you're going to get - a lot of queue wait time on your
databases!).

As far as transaction transit time, that's a
combination of cp processor speed, memory speed
(whether or not the transaction or its data is paged),
and I/O response. The faster CPU reduces the cp part
of the equation, the memory speed is reduced by
increasing the amount of memory (64 bit
addressibility) and the CP memory speed, and I/O speed
is improved by faster I/O devices + the spread of I/O
over the channels and devices (i.e., tuninig 101).

So increasing processor speed has a significant effect
on transaction time.

As to transaction volume or rate, that also is a
function of cp speed, the number of cp's, the memory
speed, and the I/O rates.

However, what tends to happen in a shop is that they
put something like a shark in, with its 16 "logical
control units" and concentrate their data in a single
LCU on a 3390-9 image! So who cares how fast your CPU
is - its waiting on the LCU, which is REALLY a
physical CU - 1 raid array and a 3390-9 image is
behind the same single subchannel as a 3390-3 image in
Linux (unless you use Thoss's trick with VM managing
the PAV's). Et voila, an I/O bottleneck and a faster
CP does in fact wait faster (or slower?)!

So, lets assume that the I/O subsystem, the CP's, and
memory are tuned properly for the workload rather than
imposing a supposed workload on the model. (I know
that there is no "ideal" workload - what one has to
deal with is the workload of the people trying to run
it on a given OS with given hardware).

Then the question still is: Are there workloads that
we  have measured in the past and written off as
inappropriate for a zSeries that is now enabled with
the lastest zSeries technology.

Some questions come to mind:

- Has the CP speed of the TREX (2084) enabled
applications that were not acceptable on a previous
machine?

- Have the latest improvements in Linux 2.4.19 and
2.4.21 and the IBM drivers enabled applications that
were not acceptable on a previous release?

- Has the latest hipersockets implementations reduced
CPU time and thus reduced "I/O wait" on the
hipersockets?

- Has the greater backend bandwidth of the later
zSeries (2 Gbyte vs 1 Gbyte) reduced the "I/O wait" to
memory and the devices?

- Has a properly tuned Total Storage Server 800
(shark) with FICON reduced the "I/O wait" to the
devides?

- Have the improvements to Java, DB2, webshpere, etc,
or has the processor speed of the TREX enabled more
applications?

Or, as Alan so rightly points out, "How many
transactions per second can I handle" and "At what
cost?" (Reducing TCO is only acceptable as a measure
if I can get as much or more work done as fast or
faster).

By the by, as to Bogomip repeatability, as in any
other measure, if there is a competing load, then you
expect the numbers to vary, sometimes greatly. In a
controlled lab, the bogomip number is repeatable and
consistent between single processors. For example:

environment: 4 shared CP on Trex 16 CP processor, with
5 active LPARS. The HMC shows an average of 26% total
CP utilization.

3 ipls in a LPAR:
2398.61 bogomips
2398.61 "
2398.61 "

3 ipls under VM:
2398.61 bogomips
2398.61 "
2398.61 "

0% error, 100% repeatably!

The 5-10% error I mentioned in an earlier note is with
a heavier load on the LPARs. The error under a loaded
VM for bogomips can be as much as a factor or 9 or
more!


=====
Jim Sibley
Implementor of Linux on zSeries in the beautiful Silicon Valley

"Computer are useless.They can only give answers." Pablo Picasso

__________________________________
Do you Yahoo!?
Exclusive Video Premiere - Britney Spears
http://launch.yahoo.com/promos/britneyspears/

Reply via email to