Thanks Mark for the heads up. We've not seen the behavior noted in this APAR, although I am tracking it now for closure. In my case, got lazy over the years and hadn’t been paying attention to CSA/ECSA SQA/ESQA allocation and utilization that has crept up. ESQA has been hovering around 80-85% on my heavily used lpars. We've had one production system spill from ESQA to ECSA since our last IPL in early May. Since we've got our fall IPL's coming up, I'm adjusting it to get it back into a more comfortable range.
_____________________________________________________________________________________________________ Dave Jousma AVP | Manager, Systems Engineering Fifth Third Bank | 1830 East Paris Ave, SE | MD RSCB2H | Grand Rapids, MI 49546 616.653.8429 | fax: 616.653.2717 -----Original Message----- From: IBM Mainframe Discussion List <[email protected]> On Behalf Of Mark Zelden Sent: Thursday, October 3, 2019 5:19 PM To: [email protected] Subject: APAR OA58438 (was Re: Planned ESQA change and HealthCheck) **CAUTION EXTERNAL EMAIL** **DO NOT open attachments or click on links from unknown senders or unexpected emails** If you are running z/OS 2.3 and increasing ESQA because of expansion into ECSA messages or sudden unexplained growth, check out APAR OA58438. We had 3 system crashes after migrations to z/OS 2.3 in 2019 and one close call after ECSA got to 99% when ESQA expanded into it (only a vendor monitor crashed in that case after a failed ECSA getmain). Stand alone dumps didn't find the root cause other than we new it was RPB pool growth related to SVC dumps from CICS. In one case a single SVC dump caused an 80M ESQA spike within one or two seconds crashed a system when it spilled into ECSA and also filled up ECSA (typically at about 70% use, but "stable"). We worked with IBM all summer on this. We had different SLIPs and GTF traces put in place, but with the traces going the problem never happen. But SVC dump processing did take over the CPU with the trace + GTF active! :-) Meanwhile, we increased ESQA on 30 LPARs via normal IPLs over the summer by about 80M and ECSA a bit as a "work around". Settings that haven't been touched in god knows how long (certainly not since 64-bit usage has increased and HVCOMMON). So we had to loose about 100M of high private to do this. We also increased real storage on a couple of LPARs that really didn't warrant it (based on zero or close to zero demand paging during normal operations), but we knew real storage was also involved in the problem (no flash memory for SVC dumps on my client's mainframes). The entire time IBM has said we are the only ones reporting the problem, but since we had the problem in big sysplexes, small sysplexes, big LPARs, small LPARs, I know that we can't be the only ones. I think other shops are ignoring the ESQA expansion into ECSA (since that in itself doesn't hurt) and / or they have more "white space". The RPB control blocks are freed after about 10 minutes, so anyone looking at their current ESQA (and ECSA) usage wouldn't notice the spikes or would just say 'oh well, looks good now". Anyway, IBM was getting close to figuring this out not too long ago and partially re-created the problem in the lab some weeks ago and just got back to us today with the root cause and the APAR that was opened. It is related to being real storage constrained at the time of the SVC dumps (I think all of the crashes were during CICS startup time in the wee morning hours). I really wanted to post something about this earlier but didn't since IBM said they had no other reported problems, So if you have seen this problem since migrating to z/OS 2.3, now you know you aren't the only ones. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN **CAUTION EXTERNAL EMAIL** **DO NOT open attachments or click on links from unknown senders or unexpected emails** This e-mail transmission contains information that is confidential and may be privileged. It is intended only for the addressee(s) named above. If you receive this e-mail in error, please do not read, copy or disseminate it in any manner. If you are not the intended recipient, any disclosure, copying, distribution or use of the contents of this information is prohibited. Please reply to the message immediately by informing the sender that the message was misdirected. After replying, please erase it from your computer system. Your assistance in correcting this error is appreciated. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
