Thanks Mark for the heads up.  We've not seen the behavior noted in this APAR, 
although I am tracking it now for closure.   In my case, got lazy over the 
years and hadn’t been paying attention to CSA/ECSA SQA/ESQA allocation and 
utilization that has crept up.   ESQA has been hovering around 80-85% on my 
heavily used lpars.   We've had one production system spill from ESQA to ECSA 
since our last IPL in early May.   Since we've got our fall IPL's coming up, 
I'm adjusting it to get it back into a more comfortable range.

_____________________________________________________________________________________________________
Dave Jousma
AVP | Manager, Systems Engineering  

Fifth Third Bank  |  1830 East Paris Ave, SE  |  MD RSCB2H  |  Grand Rapids, MI 
49546
616.653.8429  |  fax: 616.653.2717


-----Original Message-----
From: IBM Mainframe Discussion List <[email protected]> On Behalf Of 
Mark Zelden
Sent: Thursday, October 3, 2019 5:19 PM
To: [email protected]
Subject: APAR OA58438 (was Re: Planned ESQA change and HealthCheck)

**CAUTION EXTERNAL EMAIL**

**DO NOT open attachments or click on links from unknown senders or unexpected 
emails**

If you are running z/OS 2.3 and increasing ESQA because of expansion into ECSA 
messages or sudden unexplained growth, check out APAR OA58438.

We had 3 system crashes after migrations to z/OS 2.3 in 2019 and one close call 
after ECSA got to 99% when ESQA expanded into it (only a vendor monitor crashed 
in that case after a failed ECSA getmain).  Stand alone dumps didn't find the 
root cause other than we new it was RPB pool growth related to SVC dumps from 
CICS.  In one case a single SVC dump caused an 80M ESQA spike within one or two 
seconds crashed a system when it spilled into ECSA and also filled up ECSA 
(typically at about 70% use, but "stable").  

We worked with IBM all summer on this.  We had different SLIPs and GTF traces 
put in place, but with the traces going the problem never happen. But SVC dump 
processing
did take over the CPU with the trace + GTF active!   :-)  

Meanwhile, we increased ESQA on 30 LPARs via normal IPLs over the summer by 
about
80M and ECSA a bit as a "work around".   Settings that haven't been touched in 
god knows
how long (certainly not since 64-bit usage has increased and HVCOMMON).   So we 
had
to loose about 100M of high private to do this.  We also increased real storage 
on a couple of LPARs that really didn't warrant it (based on zero or close to 
zero demand paging during normal operations), but we knew real storage was also 
involved in the problem (no flash memory for SVC dumps on my client's 
mainframes).  

The entire time IBM has said we are the only ones reporting the problem, but 
since we had the problem in big sysplexes, small sysplexes, big LPARs, small 
LPARs, I know that we can't be the only ones.  I think other shops are ignoring 
the ESQA expansion into ECSA (since that in itself doesn't hurt) and / or they 
have more "white space".  The RPB control blocks are freed after about 10 
minutes, so anyone looking at their current ESQA (and ECSA) usage wouldn't 
notice the spikes or would just say 'oh well,
looks good now".   

Anyway,  IBM was getting close to figuring this out not too long ago and 
partially re-created the problem in the lab some weeks ago and just got back to 
us today 
with the root cause and the APAR that was opened.   It is related to being real
storage constrained at the time of the SVC dumps (I think all of the crashes 
were during CICS startup time in the wee morning hours).  

I really wanted to post something about this earlier but didn't since IBM said 
they had no other reported problems,  So if you have seen this problem since 
migrating to z/OS 2.3, now you know you aren't the only ones.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
[email protected] with the message: INFO IBM-MAIN **CAUTION EXTERNAL 
EMAIL**

**DO NOT open attachments or click on links from unknown senders or unexpected 
emails**

This e-mail transmission contains information that is confidential and may be 
privileged.   It is intended only for the addressee(s) named above. If you 
receive this e-mail in error, please do not read, copy or disseminate it in any 
manner. If you are not the intended recipient, any disclosure, copying, 
distribution or use of the contents of this information is prohibited. Please 
reply to the message immediately by informing the sender that the message was 
misdirected. After replying, please erase it from your computer system. Your 
assistance in correcting this error is appreciated.


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to