Joe,

In the past %TPI was a good indicator that IO was arriving for an LPAR but
it did not a logical CP dispatched by PR/SM to accept the interrupt. This
creates a "wall of interrupts" affect when the LCP is dispatched and started
finding pending interrupts with TPI process.

The recommendation for CPENABLE changes every few years, but past wisdom was
to have all LCP in an LPAR available to handle IO interrupts with
CPENABLE(0,0). My take on the logic is that because PR/SM dispatches logical
CP and not LPARS, the first LCP dispatched for an LPAR can tackle the IO
interrupts. If only LCP0 is enabled for IO interrupts then they will queue
while that LCP is not dispatched.

This is my best take on this, but I'm not a CPU guy. Anyone that knows
better should correct me post-haste.

Ron 

> -----Original Message-----
> From: IBM Mainframe Discussion List [mailto:[email protected]] On
Behalf Of
> Joe Owens
> Sent: Wednesday, February 15, 2012 5:11 AM
> To: [email protected]
> Subject: [IBM-MAIN] Z/architecture I/O questions
> 
> Hi List,
> I've got a few questions about How z hardware handles I/Os and LPAR
> dispatching.
> 
> I've done a fair bit of reading, but still some things I don't understand.
> We are on Z9's We are using shared CPs. We are not using IRD. We have 2
large
> production LPARs and several smaller LPARs. The 2 prod LPARS have
> substantially different weights 1:4, due to the CPU workload spread. We
also
> use group capacity limits and an individual capacity limit on the largest
> LPAR. While the CPU balance is different, the I/O profile is similar,
about 5-
> 6000 IOPS on each LPAR.
> 
> We have 11 logical CPs active on each of the 2 LPARS. We expect peak 4hra
of
> 90 and 400 MSUs, and the weights are set to reflect this.
> 
> The work between the 2 LPARS is split for licencing. The small LPAR is
mainly
> batch, the large LPAR is online and batch.
> 
> I believe we are seeing I/O elongation on the smaller LPAR at peak times,
> particularly when the systems are capped. A batch job I/O bound may run
2-3
> times longer on the small LPAR when the systems are busy. The I/O response
> times look slightly worse on the small LPAR, but the throughput is much
worse.
> 
> So here are my questions.
> 
> My understanding of the channel program is that it moves the data into the
> page fixed I/O buffer and the interupt a cp to process the I/O. How is the
> candidate CP chosen?  I know the z/os system may make some CPUs
uninteruptable
> for I/Os based on CPENABLE, but of the CPs that are enabled, how is one
> chosen? Is it at the physical or logical level and how is it related to
the
> LPAR which requested the I/O?
> 
> We have CPENABLE set to (10,30), RMF shows all 11 logical CPUs are taking
> interrupts (and have TPI counts), but CP A (highest number) is doing by
far
> the most. Could this be a cause of contention between the 2 LPARS, or will
> they likely be dispatched on separate physical CPs?
> 
> Next question is about the dispatch time given to a LPAR for a CP by
pr/sm.
> The pr/sm planning guide says the maximum time may be between 12.5 and
25ms (A
> lot longer than an I/O). I am thinking that if an LPAR is constrained by
> capping, it is more likely to have a queue of ready work and hold on to a
CP
> towards the maximum when it is given one?
> 
> How can I tell for sure if I am on the right track, any metrics that will
> prove what is causing the longer elapsed times on one LPAR?
> 
> What is the best way to stop it or reduce it, given that we have to run
capped
> on peak days, and we have to live with the workload separation, and we
don't
> have the capacity for dedicated CPUs.
> 
> Would wlm/ird management of CPUs help?
> Would offlining logical CPs help? We have many more CPs online to each
LPAR
> than it's normal MSU usage, but it give flexibilty for workload peaks.
> Would offlining specific logical CPs help? Ie if the 2 LPARS had a
different
> highest logical CP number would this reduce contention, or is it again
likely
> to use different physical CPs for different LPARs?
> Would tuning of the LPAR dispatch time help? We do not specify this, and
the
> recommendation is to let the system choose.
> 
> Sorry about the length of the post, but hopefully someone will find this
an
> interesting problem.
> 
> Joe Owens
> 
> 
> 
> .
> 
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions, send email
to
> [email protected] with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to