It's Friday, so I can (re)tell my war story. Shortly after z/OS R13 hit our first prod system, I noticed one morning that the system had been IPLed around 05:00. Everyone denied having done it. Then I discovered a fresh SAD taken around the same time. Sent if off to IBM. Next day or two, the same thing happened. The system was wait-stating after running clean out of storage frames! It made no sense. I posted the problem here.
Jim Mulder saw the thread and rang me up with a few questions. The failing system, unlike the sandbox, was being mirrored to the DR data center. All of it. Every single volume. Jim suspected and then confirmed that because of a change in R13, a failing page-in caused an I/O redrive, which lost track of the failing page request, which never got put back on the queue. With XRC active, some percentage of page-ins got tangled up with SDM I/O. More lost frames. Eventually MVS ran completely out of frames, and the system wait-stated. Auto SAD. Auto IPL. It was Development, so at 05:00, there were no user calls. Ops never noticed. Jim fixed the problem immediately. I believe in auto IPL. . . J.O.Skip Robinson Southern California Edison Company Electric Dragon Team Paddler SHARE MVS Program Co-Manager 323-715-0595 Mobile 626-543-6132 Office ⇐=== NEW robin...@sce.com -----Original Message----- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Mark Zelden Sent: Friday, May 12, 2017 12:02 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: (External):Re: AUTOIPL SADUMP LOADPARM flag value On Fri, 12 May 2017 16:18:14 +0000, Jesse 1 Robinson <jesse1.robin...@sce.com> wrote: >I'm curious as to why you do not want automatic reIPL after SADMP. Your >system is in a non-restartable wait state, after all. I view that as the >ultimate performance degradation. ;-) You have an SAD. If want to look at >it or at OPERLOG, you need at least one system in the sysplex up and >running. Why not this one? > >IBM has recommended auto IPL for many years based on decades of problem >analysis. Nothing will ever get better on a dead system. ReIPL might fail, >but it's worth a try. You can also speed up SAD such that no operator >intervention is required. It's possible for a system to die, take an SAD, >and reIPL before the operator gets back from coffee break. I've seen it >happen. > If IBM-MAIN had a like button or thumbs up, you would have it. Haven't actually had a crash in a long time, but the last time my client had one that was basically the scenario. By the time I was getting instant messages and automated alerts / pages were going out to everyone, the system was already back up and had 100% application availability. It was something like a 10 minute outage total. The client wasn't happy, but it sure beats the heck out of "the old days" of initiating a stand alone dump manually and re-ipling after it completed. That's if an operator or even a sysprog could find the doc or knew how to do an SADUMP and do it correctly! Regards, Mark -- Mark Zelden - Zelden Consulting Services - z/OS, OS/390 and MVS ITIL v3 Foundation Certified mailto:m...@mzelden.com Mark's MVS Utilities: http://www.mzelden.com/mvsutil.html ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN