While we wait for the EREP, it might be fun to look at the presented evidence 
and do some speculating. 

I don’t think a single task can do this, even at a FF priority. There are 
interrupts and a task under the control of the dispatcher can't own the whole 
box (even a uni processor) exclusively for that long. 

That would point to something deeper. I like the spin loop scenario. Some task 
grabbed a spin lock and the hardware had to step in to break its grip. 

A hole in that logic is the complete absence of recognizable SYSLOG messages 
both before and after the event. Another path is that anytime the hardware 
takes some drastic action, there is almost always a 'phone home' event. I'd 
want to look at the HMC/SE logs, and call the support center to get their 
perspective. 

If it were my call, I'd take the shop to sev 2 right now. If I don't have a 
satisfactory explanation in short order, then I'd up the bar to sev 1 (nobody 
goes home) and have everyone start thinking about having to pull a DR trigger. 
I would have to assume that the next time the box went to sleep it may not wake 
up. 

Just my $0.02 US (before Taxes)     


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf Of 
Eric Bielefeld
Sent: Wednesday, January 07, 2009 12:06 PM
To: [email protected]
Subject: Re: MVS 4 minute 'outage'

I would suggest that some very high priority task got in a loop - something 
that runs at dispatching priority x'FF'.  But then if it were truly in a loop, 
why did it stop after 4 minutes?  

Do you just have 1 processor?  

Good luck in finding the problem.  Let us know if it happens again.  I'm sure 
you have IBM involved - that sure sounds like a Sev 1 to me.

Eric

---- JE Thinnes <[email protected]> wrote: 
> We just experienced a 4 minute 'outage' on our z/OS system.  (single image 
> z/OS 1.9 system).
> 
> By 'outage', I mean we could not communicate with MVS through TSO or the 
> z/OS consoles.  There is a 4 minute gap in SYSLOG.  The same for CICS, IMS 
> and DB2 logs.  
> 
> There were no system dumps or other indicators.
> 
> We reviewed SYSLOG for the 15 minutes that preceeded the 'outage' and did 
> not find anything.  TMONMVS had a 4 minute gap in the collector during 
> the 'outage'.
> 
> Any suggestions how we can determine what happened?--

Eric Bielefeld
Systems Programmer
Washington University
St Louis, Missouri
314-935-3418

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
NOTICE: This electronic mail message and any files transmitted with it are 
intended
exclusively for the individual or entity to which it is addressed. The message, 
together with any attachment, may contain confidential and/or privileged 
information.
Any unauthorized review, use, printing, saving, copying, disclosure or 
distribution 
is strictly prohibited. If you have received this message in error, please 
immediately advise the sender by reply email and delete all copies.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to