Actually, while I/O is the classic example of why processor speed is not
everything,  you don't have to move that far beyond the processor itself to
show this.   Note that the various types of servers have different size and
structures of L1, L2, L3,  caches and memory interfaces.   Also  note that
the memory latency and bandwidth varies from machine to machine.   Finally
note that the faster the processor the higher the latencies are in terms of
number of cycles.   Benchmarks are sensitive to this, but not uniformly.
For example TPC-C got a big boost from 8MB L2 caches,  but SPECint is
almost totally insensitive to L2 size.  Real work has a tendency to show
even more variability.  That is there are more daemons,etc. chewing up
cache space causing more misses, and the code is not as extensively tuned.
This tide will float all boats (even if you run tpc-c for a living don't
expect the tuned rates if you also are running security, monitoring,
accounting,etc.).  However, the differences in memory hierarchy will cause
some machines to be impacted more than others.   There is really no way
except experience to tell how an application treats the memory hierarchy in
this regard.  As a result defining relative capacity with a single metric
is not possible, and any benchmark or metric that is suggested for this
will not match the "real world", which will exhibit more dynamic
variability both with time and workload.

On thing is for sure though:  If a cache is blown, during the miss time the
processor is 100% busy as measured by normal means, and the throughput is
zero.   You can see if this is happening to an significant extent by
plotting throughput v processor utilization. (Throughput can come from the
application or be estimated by network data rate).   Draw linear, "power"
and logaritmic trends through the data.  If the best trend is linear and
the line intercepts the vertical axis near or below the origin then there
is little or no "saturation" and little pressure on the memory hierarchy.
If the best trend is logarithmic then there is heavy saturation and chances
are that the workload is blowing the caches as load is applied.   If the
power curve fits the best the answer is somewhere in the middle.   As the
exponent of the power curve approaches 1 the workload is exhibiting less
saturation because the trend becomes linear.  The more saturation exhibited
the higher the utilization at which zLinux consolidation is viable.  This
is because the raw conversion factor is typically better for more saturated
workloads.

My point is there is no such thing as a metric which defines the relative
capacity of various servers because of the differences in their memory
hierarchies.
Processor speed is an indicator, but it is insufficient to do any real
comparisons.   The processor speeds are what they are. You can measure them
with MHz, BOGOMIPS, SPECint, or hello world and a stop watch.  You still
will not understand the relative capacity for any particular piece of work
once you do.  This is because work always causes the CPU speed differences
to be mitigated by other bottlenecks such as waiting for memory or I/O and
the impact varies with workload, time, and machine architecture. The rules
developed early on were based on some intuitive understanding that heavy
computational work is less impacted by non processor bottlenecks, whereas
transactional workloads with lots of locking and data sharing are more
impacted.  Since most other machines come from a heritage which emphasizes
the former and zSeries heritage is firmly in the latter, the intuitive
choices are generally correct and don't change with normal evolutionary
changes in processor speed.


Joe Temple
[EMAIL PROTECTED]
845-435-6301  295/6301   cell 914-706-5211 home 845-338-8794



                      Alan
                      Altmark/Endicott/        To:       [EMAIL PROTECTED]
                      [EMAIL PROTECTED]                cc:
                      Sent by: Linux on        Subject:  Re: Perpetuating Myths about 
the zSeries
                      390 Port
                      <[EMAIL PROTECTED]
                      IST.EDU>


                      10/29/2003 02:00
                      PM
                      Please respond to
                      Linux on 390 Port






On Wednesday, 10/29/2003 at 10:08 PST, Jim Sibley
<[EMAIL PROTECTED]> wrote:

> I disagree about your "I/O wait is independent of
> processor speed" because that is only one component of
> reponse time. The faster processor can initiate I/O's
> faster and can service interrupts faster, thus
> reducing internal queue wait times.

The faster processor can start more I/Os per second than a slower
processor.   A faster I/O processor can move data off the channel into
memory faster (and vice versa).  The speed of the channel itself and the
device does not change.  Yes, you CAN change it, but it is a function (and
price) that is independent of CPU selection.

> As far as transaction transit time, that's a
> combination of cp processor speed, memory speed
> (whether or not the transaction or its data is paged),
> and I/O response.

YES!  ABSOLUTELY!  (With I/O response being an amalgam of channel speed,
control unit speed and caching, device speed and caching, and contention.)

> Then the question still is: Are there workloads that
> we  have measured in the past and written off as
> inappropriate for a zSeries that is now enabled with
> the lastest zSeries technology.

I would only ask that you complete the picture by factoring in costs.
Changes in prices of energy, people, real estate, machines, etc., can
bring on board workloads that were previously out of reach.  This is the
core of the TCO argument.  Are you able to achieve acceptable results at a
price you're willing to pay?  And is that total cost (not just acquisition
costs of s/w & h/w) to deploy the workload less than you pay now?

It's certainly easy to focus on just the technology, which can be used to
make an initial cut (nope, probably still shouldn't use the mainframe for
your next feature-length animated film) but let's not forget the other
parts of the TCO equation.

Alan Altmark
Sr. Software Engineer
IBM z/VM Development

Reply via email to