>We have INTERVAL(85) and OPNOTIFY(87) and CLEANUP(15) so my question >is when the resources get frozen. I hope that someone will correct me if I get this wrong:
System removal timeline, assuming all defaults: 1. v xcf,sysname,offline and press enter on the reply that comes for the v xcf,off. This is what kicks everything off. In an ideal world, CLEANUP starts to run, meaning XCF on the shutdown system notifies all members of all XCF groups in the sysplex that it is about to load a wait state. Every XCF group is now supposed to do 'cleanup' for the leaving member. 2. After cleanup expires, the shutdown system loads the non-restartable wait0A2. This kicks off INTERVAL and OPNOTIFY. Loading a wait state means that the other systems detect SSUM (system status update missing), so INTERVAL at twice SPINTIME+5s starts to run. At INTERVAL+3s message IXC102A is issued on the surviving system. 3. Once the wait state is loaded, system reset can take place. There is no need to wait for IXC102A being issued. The reply to IXC102A probably leads to XCF telling all XCF group members that 'system reported gone' (and I have no clue which is issued in which order - the events being told to all xcf group members are described in detail in Sysplex Services Guide under 'Events that Cause XCF to Schedule a Group User Routine' - have fun reading that!) So much for the system side of shutdown. What I have no clue about is how TCPIP handles XCF telling it that a member of the XCF group goes away. TCPIP might already start reacting if and when the XCF group member on the system where the 'p TCPIP' command was issued for shutdown. Some sort of XCF cleanup (like 'leave the group') must be done when tcpip terminates. I don't know if TCPIPs 'detection of timeout' starts at the point CLEANUP starts to run or if it starts with SSUM or in reaction to any of the other state changes. TCPIP could also have timers completely independent of any XCF traffic that cause the messages you see. >We have never required operators to RESET a down system once a wait >state is achieved after V XCF OFF. I can't deny the possibility of some >comatose I/O operation miraculously coming to life after seconds or minutes >of WAIT STATE. Skip, I don't think the danger of a missing reset is so much an errant I/O, it's rather a hardware reserve that didn't get cleared for one reason or another, preventing other systems form accessing that device. Once in a parallel sysplex, fencing is done via the CF without the need for explicit system reset (except for the last system in the sysplex). So mostly, the warnings are for basic sysplexes that cannot do automatic fencing. Here it is really important to get all reserves released. >I can however attest to the inherent risk of an operator--or a >distracted sysprog--going to the trouble of unlocking an LPAR in order to >RESET...the wrong image. BTDTGTS, too, in this installation. Not by me, fortunately (I shut down one system and then varied offline the one that I hadn't shut down - thankfully the sysprog sandplex)! Besides, in case there are IPL problems, I wouldn't want to leave IBM the slim chance of them telling me 'you shouldn't have done that, so we're not fixing anything'. Barbara ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html

