>- What are the typical steps taken to identify/recover from a WSC ?   For 
>example,

The one occurance we had recently that resulted in a wait state (during 
hardware migration - hot swap to a new CPU) was entirely due to IBM changing 
the rules that applied forever without notice, in other words that I consider a 
bug. (IBM differs on that opinion). The wait state code was ridiculous (GRS 
saying it couldn't get to the ISGLOCK structure), especially in light that this 
was the *second* system in that sysplex to get IPL'd, and the first was happily 
running along on the same box using that structure. Besides, GRS wasn't the 
culprit in this, even though they loaded the wait state.

My colleagues were unable to determine *what* caused the wait state, for the 
simple reason that the accompanying message wasn't visible anymore. In this, I 
also blame IBM who apparently do all their testing under VM (where the NIP 
messages stay on the console) and have no clue about the real world that 
actually uses 'real' consoles NOT under VM for a NIP console. A 'real' console 
is these days attached somehow to an IP network. The second the wait state 
hits, that console is released to the network. Which means that the 'IP 
network' puts its welcome message back onto that console, effectively wiping 
out any messages that might have been visible there. And believe me, that is so 
fast that you have no chance of seeing any preceding message, much less 
comprehend it. 

Admittedly at this point, an sadump *should* have been taken, just to get at 
the NIP messages. Unfortunately I am the resident dump reader, and I wasn't 
there that night (besides, I would have known immediately how to fix this and 
wouldn't have needed the sadump). So it took three of my colleagues about 4 
hours of trial and error to find a bypass, until they inadvertently hit the 
right thing to do. (Don't tell me that we should have opened an ETR with IBM - 
that wouldn't have given faster results, either, since we are NOT a US 
customer).

IBM should really test their systems on real consoles (not VM, and not on the 
HMC, either), because IBM doesn't even understand how hard it is to see NIP 
messages preceding a problem. In addition, *EVERY* NIP message explaining a 
wait state should be *in the wait state message*, not some preceding 'for your 
info' thing. Especially when the component loading the wait state wasn't the 
culprit. 

- Are the existing messages useful/sufficient for identifying the underlying 
cause and how to correct the WSC ?
No.

- Are there circumstances (e.g. console setup, etc.) that prevent the existing 
messages from being seen ?
Use real consoles for testing, not the HMC and not VM consoles. And don't only 
test new designs by only using what IBM thinks a customer should do and by only 
testing the upper limits of any parameter. I have the impression that lower 
limits are never tested, even if they are allowed. And IBMs test scenarios lack 
real world experience, it seems to be an ivory tower outlook.

- Are additional/better/different messages needed to identify and/or correct 
the cause of the WSC ?
Yes. Those that don't disappear when the system hits the wait state.

- Is there any need to diagnose and/or correct problems without resorting to 
another running z/OS system ?
Yes, if it is the first system to get IPL'd on new hardware and the wait state 
code implies some sort of setup problem that will need to get fixed by changing 
parms. Others have commented on this.
If you have to take an sadump to find out why you got the wait state, you will 
always need a system up and running to even debug. Or to send the dump to IBM.

- To what extent, if any, is it desirable to avoid the WSC by providing some 
means of error recovery ? 
It is desirable not to change the rules, especially in sensitive areas of IPL, 
way before the customer hits a wait state code with a well-known and previously 
working setup just because design changes weren't thoroughly tested with 
real-world scenarios and lower limits.

Barbara Nitz

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN

Reply via email to