While we wait for the EREP, it might be fun to look at the presented evidence and do some speculating.
I don’t think a single task can do this, even at a FF priority. There are interrupts and a task under the control of the dispatcher can't own the whole box (even a uni processor) exclusively for that long. That would point to something deeper. I like the spin loop scenario. Some task grabbed a spin lock and the hardware had to step in to break its grip. A hole in that logic is the complete absence of recognizable SYSLOG messages both before and after the event. Another path is that anytime the hardware takes some drastic action, there is almost always a 'phone home' event. I'd want to look at the HMC/SE logs, and call the support center to get their perspective. If it were my call, I'd take the shop to sev 2 right now. If I don't have a satisfactory explanation in short order, then I'd up the bar to sev 1 (nobody goes home) and have everyone start thinking about having to pull a DR trigger. I would have to assume that the next time the box went to sleep it may not wake up. Just my $0.02 US (before Taxes) -----Original Message----- From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf Of Eric Bielefeld Sent: Wednesday, January 07, 2009 12:06 PM To: [email protected] Subject: Re: MVS 4 minute 'outage' I would suggest that some very high priority task got in a loop - something that runs at dispatching priority x'FF'. But then if it were truly in a loop, why did it stop after 4 minutes? Do you just have 1 processor? Good luck in finding the problem. Let us know if it happens again. I'm sure you have IBM involved - that sure sounds like a Sev 1 to me. Eric ---- JE Thinnes <[email protected]> wrote: > We just experienced a 4 minute 'outage' on our z/OS system. (single image > z/OS 1.9 system). > > By 'outage', I mean we could not communicate with MVS through TSO or the > z/OS consoles. There is a 4 minute gap in SYSLOG. The same for CICS, IMS > and DB2 logs. > > There were no system dumps or other indicators. > > We reviewed SYSLOG for the 15 minutes that preceeded the 'outage' and did > not find anything. TMONMVS had a 4 minute gap in the collector during > the 'outage'. > > Any suggestions how we can determine what happened?-- Eric Bielefeld Systems Programmer Washington University St Louis, Missouri 314-935-3418 ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html NOTICE: This electronic mail message and any files transmitted with it are intended exclusively for the individual or entity to which it is addressed. The message, together with any attachment, may contain confidential and/or privileged information. Any unauthorized review, use, printing, saving, copying, disclosure or distribution is strictly prohibited. If you have received this message in error, please immediately advise the sender by reply email and delete all copies. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html

