Hi,

I had noticed enqueue contention on SYSZIDGI on a recurring basis and I
think I have the cause in APAR OA14084.
This got a hard look when it cropped up during another problem we now
think unrelated but it is "interesting".

If you are interested in things that impact RNLs used or you are seeing
delays when you issue SMS commands this may be interesting to you.

        Best Regards, 

                Sam Knutson, GEICO 
                Performance and Availability Management 
                mailto:[EMAIL PROTECTED] 
                (office)  301.986.3574 

Our life is frittered away by detail. Simplify, simplify. Henry David
Thoreau (1817
1862)


APAR Identifier ...... OA14084      Last Changed ........ 06/03/01
  ZOS 1.5 HDZ11H0 CHANGED THE ENQUE SYSZIGDI FROM LOCAL (SYSTEM)
  TO GLOBAL (SYSTEMS), CAUSING LARGE INCREASE OF ENQ'S PROPAGATED
 
  Symptom ...... WS WAITxxx           Status ........... CLOSED  PER
  Severity ................... 3      Date Closed ......... 05/12/09
  Component .......... 5695DF101      Duplicate of ........
  Reported Release ......... 1H0      Fixed Release ............ 999
  Component Name STORAGE MGMT SU      Special Notice           HIPER
  Current Target Date ..06/01/08      Flags
  SCP ...................                              PERFORMANCE
  Platform ............
 
  Status Detail: SHIPMENT - Packaged solution is available for
                            shipment.
 
  PE PTF List:
 
  PTF List:
  Release 1H0   : UA23264 available 06/02/08 (F602 )
  Release 1J0   : UA23265 available 06/02/08 (F602 )
  Release 1K0   : UA23266 available 06/02/08 (F602 )
 
 
  Parent APAR:
  Child APAR list:
 
 
  ERROR DESCRIPTION:
  ERROR DESCRIPTION:
 
  ENVironment: Z/OS R1.5 and above.
 
  Problem Descripton:
  ZOS 1.5 HDZ11H0 changed the enqueue SYSZIGDI from local (system)
  to global (systems), causing large increase of enq's propagated,
  which can cause increase in contention, increased cpu
  consumption, intermmitent ESQA exhaustion, which can cause
  system hang and IPL symptoms.
  Note: This can heavily impact the length of time which is
  required for a command updating the COMMDS, to propagate around
  all members of a SMSPLEX. An example of such a command is the
  VARY SMS.
 
  Additional keywords:
  Disabled wait 0101 WAIT101 "long vary propagation time around
  smsplex"


  Additional symptoms:
  1. UCBSMS bit is off (UCB FL5 = x'88' vs x"A8') in an
  unpredictable fashion, due to delay in SMS VARY processing.
 
  2. SMS configuration changes such as activate, vary processing
  are excessively delayed (some clients have reported hours). This
  is also related to extremely busy GRS processing, and is
  considered a "trigger event" which can contribute to a "bottle
  neck" for other applications which are users of GRS services.
 
  3. System hang, Standalone dump SAD with Svcdump title:
  'END OF MEMORY RESOURCE MANAGER HANG DETECTED: TCB = 008CA7D0,
  NAME = ISGGTRM0- SCSDS
 
  Recommendation:  Implement the exclude RNL. Be aware the RNL
  must be completely in place around the sysplex to be completely
  effective.  Implementation during a time with low SYSZIGDI (or
  other) enq activity, which would prevent the RNL's activation is
  recommended for implementation.
 
 
  LOCAL FIX:
  Note, Step 1 is most helpful only if you are undergoing current
  system slowdown due to GRS being flooded with SYSZIGDI, which is
  impacting other workload.
 
  Note, Step 2 is an RNL EVERYONE at ZOS 1.5 and above should
  implement AS SOON AS POSSIBLE (see considerations below), UNTIL
  the PTF is available and IS APPLIED.
 
  Circumvention:
  1.) As a partial assist when SYSZIGDI GRS enqueues are seen,
  immediately set SMS INTERVAL  (not DINTERVAL) to 60 seconds via
  MVS command:
     SETSMS INTERVAL(60)
  and see message:
     IEE712I SETSMS   PROCESSING COMPLETE
 
  Note: This INTERVAL change should only reduce the number of
  enq's frequency due to interval processing happening "now" but
  will not "solve" the problem".  For instance, if several vary
  commands, and an activate, or a backup operation all happen
  during a period of time, they can (and have) had contention
  among themselves for extended periods of time.
 
  2.) ALSO AT THE EARLIEST OPPORTUNITY - IMPLEMENT following RNL
  to your GRS exclude list (will also reduce GRS cpu utilization):
 
  RNLDEF RNL(EXCL) TYPE(SPECIFIC) QNAME(SYSZIGDI)
    RNAME('ICMRT.CMDSADDR_LOCKED')
 
  Considerations:
  Failure to implement this RNL (or the PTF when available) has
  shown heavy impact on the length of time which is required for a
  command - which updates the COMMDS - to propagate around all
  members of a SMSPLEX, and across SYSPLEX. An example of such a
  commands are the SMS ACTIVATE or VARY SMS.
 
  3. Consider making the following in your default IGDSMSxx member
  of parmlib, which will minimize the impact of this enqueue until
  an ptf fix is available, or until you have implemented the RNL.
  After PTF or RNL implementation, consider returning to your
  original INTERVAL value.
 
     INTERVAL(60)
 
 
  PROBLEM SUMMARY:
  ****************************************************************
  * USERS AFFECTED: All DFSMS users                              *
  ****************************************************************
  * PROBLEM DESCRIPTION: At DFSMS Release 1.5 the GRS resource's *
  *                      scope used to serialize access, read or *
  *                      update, to the SMS configuration was    *
  *                      changed from LOCAL to GLOBAL. The       *
  *                      change has resulted in a GRS SYSPLEX    *
  *                      lock out when the system holding the    *
  *                      resource encounters another problem.    *
  ****************************************************************
  * RECOMMENDATION:                                              *
  ****************************************************************
  The SMS GRS resource used to serialized access to the local
  configurations was changed from LOCAL to GLOBAL to fix a
  configuration refresh problem. Since that time the refresh
  problem source has been identified and fixed by subsequent
  APARs.
 
 
  PROBLEM CONCLUSION:
  The GRS resource used to serialize access to the SMS local copy
  of the configuration has been changed back to LOCAL from GLOBAL.

][
====================
This email/fax message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution of this
email/fax is prohibited. If you are not the intended recipient, please
destroy all paper and electronic copies of the original message.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to