>- What are the typical steps taken to identify/recover from a WSC ? For >example,
The one occurance we had recently that resulted in a wait state (during hardware migration - hot swap to a new CPU) was entirely due to IBM changing the rules that applied forever without notice, in other words that I consider a bug. (IBM differs on that opinion). The wait state code was ridiculous (GRS saying it couldn't get to the ISGLOCK structure), especially in light that this was the *second* system in that sysplex to get IPL'd, and the first was happily running along on the same box using that structure. Besides, GRS wasn't the culprit in this, even though they loaded the wait state. My colleagues were unable to determine *what* caused the wait state, for the simple reason that the accompanying message wasn't visible anymore. In this, I also blame IBM who apparently do all their testing under VM (where the NIP messages stay on the console) and have no clue about the real world that actually uses 'real' consoles NOT under VM for a NIP console. A 'real' console is these days attached somehow to an IP network. The second the wait state hits, that console is released to the network. Which means that the 'IP network' puts its welcome message back onto that console, effectively wiping out any messages that might have been visible there. And believe me, that is so fast that you have no chance of seeing any preceding message, much less comprehend it. Admittedly at this point, an sadump *should* have been taken, just to get at the NIP messages. Unfortunately I am the resident dump reader, and I wasn't there that night (besides, I would have known immediately how to fix this and wouldn't have needed the sadump). So it took three of my colleagues about 4 hours of trial and error to find a bypass, until they inadvertently hit the right thing to do. (Don't tell me that we should have opened an ETR with IBM - that wouldn't have given faster results, either, since we are NOT a US customer). IBM should really test their systems on real consoles (not VM, and not on the HMC, either), because IBM doesn't even understand how hard it is to see NIP messages preceding a problem. In addition, *EVERY* NIP message explaining a wait state should be *in the wait state message*, not some preceding 'for your info' thing. Especially when the component loading the wait state wasn't the culprit. - Are the existing messages useful/sufficient for identifying the underlying cause and how to correct the WSC ? No. - Are there circumstances (e.g. console setup, etc.) that prevent the existing messages from being seen ? Use real consoles for testing, not the HMC and not VM consoles. And don't only test new designs by only using what IBM thinks a customer should do and by only testing the upper limits of any parameter. I have the impression that lower limits are never tested, even if they are allowed. And IBMs test scenarios lack real world experience, it seems to be an ivory tower outlook. - Are additional/better/different messages needed to identify and/or correct the cause of the WSC ? Yes. Those that don't disappear when the system hits the wait state. - Is there any need to diagnose and/or correct problems without resorting to another running z/OS system ? Yes, if it is the first system to get IPL'd on new hardware and the wait state code implies some sort of setup problem that will need to get fixed by changing parms. Others have commented on this. If you have to take an sadump to find out why you got the wait state, you will always need a system up and running to even debug. Or to send the dump to IBM. - To what extent, if any, is it desirable to avoid the WSC by providing some means of error recovery ? It is desirable not to change the rules, especially in sensitive areas of IPL, way before the customer hits a wait state code with a well-known and previously working setup just because design changes weren't thoroughly tested with real-world scenarios and lower limits. Barbara Nitz ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN