Hi,
I had noticed enqueue contention on SYSZIDGI on a recurring basis and I
think I have the cause in APAR OA14084.
This got a hard look when it cropped up during another problem we now
think unrelated but it is "interesting".
If you are interested in things that impact RNLs used or you are seeing
delays when you issue SMS commands this may be interesting to you.
Best Regards,
Sam Knutson, GEICO
Performance and Availability Management
mailto:[EMAIL PROTECTED]
(office) 301.986.3574
Our life is frittered away by detail. Simplify, simplify. Henry David
Thoreau (1817
1862)
APAR Identifier ...... OA14084 Last Changed ........ 06/03/01
ZOS 1.5 HDZ11H0 CHANGED THE ENQUE SYSZIGDI FROM LOCAL (SYSTEM)
TO GLOBAL (SYSTEMS), CAUSING LARGE INCREASE OF ENQ'S PROPAGATED
Symptom ...... WS WAITxxx Status ........... CLOSED PER
Severity ................... 3 Date Closed ......... 05/12/09
Component .......... 5695DF101 Duplicate of ........
Reported Release ......... 1H0 Fixed Release ............ 999
Component Name STORAGE MGMT SU Special Notice HIPER
Current Target Date ..06/01/08 Flags
SCP ................... PERFORMANCE
Platform ............
Status Detail: SHIPMENT - Packaged solution is available for
shipment.
PE PTF List:
PTF List:
Release 1H0 : UA23264 available 06/02/08 (F602 )
Release 1J0 : UA23265 available 06/02/08 (F602 )
Release 1K0 : UA23266 available 06/02/08 (F602 )
Parent APAR:
Child APAR list:
ERROR DESCRIPTION:
ERROR DESCRIPTION:
ENVironment: Z/OS R1.5 and above.
Problem Descripton:
ZOS 1.5 HDZ11H0 changed the enqueue SYSZIGDI from local (system)
to global (systems), causing large increase of enq's propagated,
which can cause increase in contention, increased cpu
consumption, intermmitent ESQA exhaustion, which can cause
system hang and IPL symptoms.
Note: This can heavily impact the length of time which is
required for a command updating the COMMDS, to propagate around
all members of a SMSPLEX. An example of such a command is the
VARY SMS.
Additional keywords:
Disabled wait 0101 WAIT101 "long vary propagation time around
smsplex"
Additional symptoms:
1. UCBSMS bit is off (UCB FL5 = x'88' vs x"A8') in an
unpredictable fashion, due to delay in SMS VARY processing.
2. SMS configuration changes such as activate, vary processing
are excessively delayed (some clients have reported hours). This
is also related to extremely busy GRS processing, and is
considered a "trigger event" which can contribute to a "bottle
neck" for other applications which are users of GRS services.
3. System hang, Standalone dump SAD with Svcdump title:
'END OF MEMORY RESOURCE MANAGER HANG DETECTED: TCB = 008CA7D0,
NAME = ISGGTRM0- SCSDS
Recommendation: Implement the exclude RNL. Be aware the RNL
must be completely in place around the sysplex to be completely
effective. Implementation during a time with low SYSZIGDI (or
other) enq activity, which would prevent the RNL's activation is
recommended for implementation.
LOCAL FIX:
Note, Step 1 is most helpful only if you are undergoing current
system slowdown due to GRS being flooded with SYSZIGDI, which is
impacting other workload.
Note, Step 2 is an RNL EVERYONE at ZOS 1.5 and above should
implement AS SOON AS POSSIBLE (see considerations below), UNTIL
the PTF is available and IS APPLIED.
Circumvention:
1.) As a partial assist when SYSZIGDI GRS enqueues are seen,
immediately set SMS INTERVAL (not DINTERVAL) to 60 seconds via
MVS command:
SETSMS INTERVAL(60)
and see message:
IEE712I SETSMS PROCESSING COMPLETE
Note: This INTERVAL change should only reduce the number of
enq's frequency due to interval processing happening "now" but
will not "solve" the problem". For instance, if several vary
commands, and an activate, or a backup operation all happen
during a period of time, they can (and have) had contention
among themselves for extended periods of time.
2.) ALSO AT THE EARLIEST OPPORTUNITY - IMPLEMENT following RNL
to your GRS exclude list (will also reduce GRS cpu utilization):
RNLDEF RNL(EXCL) TYPE(SPECIFIC) QNAME(SYSZIGDI)
RNAME('ICMRT.CMDSADDR_LOCKED')
Considerations:
Failure to implement this RNL (or the PTF when available) has
shown heavy impact on the length of time which is required for a
command - which updates the COMMDS - to propagate around all
members of a SMSPLEX, and across SYSPLEX. An example of such a
commands are the SMS ACTIVATE or VARY SMS.
3. Consider making the following in your default IGDSMSxx member
of parmlib, which will minimize the impact of this enqueue until
an ptf fix is available, or until you have implemented the RNL.
After PTF or RNL implementation, consider returning to your
original INTERVAL value.
INTERVAL(60)
PROBLEM SUMMARY:
****************************************************************
* USERS AFFECTED: All DFSMS users *
****************************************************************
* PROBLEM DESCRIPTION: At DFSMS Release 1.5 the GRS resource's *
* scope used to serialize access, read or *
* update, to the SMS configuration was *
* changed from LOCAL to GLOBAL. The *
* change has resulted in a GRS SYSPLEX *
* lock out when the system holding the *
* resource encounters another problem. *
****************************************************************
* RECOMMENDATION: *
****************************************************************
The SMS GRS resource used to serialized access to the local
configurations was changed from LOCAL to GLOBAL to fix a
configuration refresh problem. Since that time the refresh
problem source has been identified and fixed by subsequent
APARs.
PROBLEM CONCLUSION:
The GRS resource used to serialize access to the SMS local copy
of the configuration has been changed back to LOCAL from GLOBAL.
][
====================
This email/fax message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution of this
email/fax is prohibited. If you are not the intended recipient, please
destroy all paper and electronic copies of the original message.
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html