Don,

My two cents

> 
> I do not know what are the potential performance limiting factors are for
> the following:
> 1. The FICON cable/protocol
[Ron Hawkins] The Microprocessors on the Host and on the storage are the
first limiting factor. Saturating these usually leads to connect time and/or
Pend time increase depending on how and when the vendor responds to channel
commands. 

Next is blocks sizes - still the bane of serial channels the cost of
processing start subchannel commands is still far greater than the transfer
time of unchained IO less than 4KB. 

Finally I'd say Open Exchanges, especially when many channels are fanned
into a single FICON card. For example when you have 4 or more CEC running
their own channels into a switch and fanned into a single port. That's
sixteen channels into four ports capable of 64 OE each - a potential total
of 1024 OE into a single card.

  
> 2. The IBM's processor channel card
[Ron Hawkins] In the past Cathy Cronin has recommended keeping Channel and
Bus MP Busy below 50%. I'm not going to argue with her. 

In single channel tests on z9 and USP-V the Host Channel MP saturates before
the Storage, so I'm able to measure the affects of a single saturated
volume. This was a 4kb read/write workload spread across 1024 volumes with
some volumes having skewed IO and a variety of read cache misses (avg 74%).
>From around 70% Host Channel MP busy I see OE hitting 40-45 and pend time
growing to 3-5ms. CMR time is staying constant at 1ms, which indicates
delays at the Host End.

At >95% Channel MP busy I'm averaging 45-50 OE and PEND time has jumped to
45ms, with CMR at 1-2ms. At this point the SAP assigned to that channel had
also jumped to 96% busy with almost no other IO activity running on the CEC.

That's a very nasty response time 12K IOPS using 4KB Blocks. To give some
idea of how response time responded to Channel MP Busy:
        IOPS    CHAN%   AVGRSPMS
        1K      9%      1.796
        3K      27%     1.671
        6K      51%     1.853
        9K      72%     2.880
        12042   91%     6.686
        12100   96%     47.77

So for this workload we see response time has jumped by 1ms going from 27%
to 51% channel busy, and things get worse from there. It suggests that Cathy
Cronin's recommendation is spot on. And if you are trying to get that last
10% of activity out of your channels, you may want to think again. The
interaction of Channel usage, retries, IOP % Busy and OE certainly needs
some research, but doing this on a z9 is not timely. Suffice to say that
keeping FICON channels away from 90% peak, and 50% in a 15 minute interval
should keep you out of trouble.

NB These are all one minute intervals. It may be worth checking to see if
larger 15 minute intervals are masking this sort of high pend time due to
high activity on the Host Channel MP. 


> 3. The IBM's storage channel cards
[Ron Hawkins] I am not familiar with the DS8700 yet, but for the DS8300 the
channel card had a single 1GHz MP shared by all four channels. IBM
recommended using only two ports at 2Gb or one port at 4Gb with this card,
and I believe it had to do with the throughput of this MP as observed in the
performance numbers they published comparing 2Gb and 4Gb HCA back in 2007.


> 4. Any switches along the path
[Ron Hawkins] I've never tried to saturate the backplane of a switch to see
how Host and Storage respond. I've found switches to be transparent to
response time and through put on FICON. The only serious performance problem
I've heard of was one switch vendor that negotiated eight buffer credits
with the Host so that IO on the channel was limited to just one or two OE.

Ron

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to