When you issue the VARY XCF command to remove an active system from the
sysplex, XCF will drive the group exit routines of any active members on
the subject system to inform them that the system is going to be removed
from the sysplex.  XCF also initiates an ENF signal to similarly inform
those applications that are listening for the "system being removed from
the sysplex" event.  The intended purpose of these notifications is to give
the recipient a chance to perform a normal shutdown before the system wait
states (the CLEANUP interval determines how much time they get).

Many installations likely have orderly shutdown procedures such that by the
time all the applications and subsystems have been shutdown, there might
not be anyone left to process these notifications.  That answer will depend
on what software you are running and whether that software does anything
with the notifications.  But in case there are, I think it would be prudent
to issue the VARY XCF command to give them a chance to cleanup normally.


Once the system is in a wait-state, XCF needs to remove it from the
sysplex.  First, the surviving systems must detect that the system is no
longer operational.  If the sysplex is configured to have XCF exploit BCPii
services, that detection will generally occur within 6 seconds or so.  If
not, it will take as long as the "Failure Detection Interval" for the
surviving systems to suspect that the system might be dead.  For most
installations, the FDI is on the order of 165 seconds.

Having decided that the unresponsive system might be dead, XCF will want to
take action to resolve the problem.  If the sysplex is configured to have
XCF exploit BCPii services, XCF will generally complete removal of the dead
system from the sysplex immediately after the system was discovered to be
in a wait-state.  If not so configured, read on.

Starting with z/OS V1R11, XCF will by default, proceed as if every system
had an active SFM policy that specified ISOLATETIME(0).  You would have to
create and activate an SFM policy to override this default action.  So
after the system wait-states, it will be unresponsive.  After the FDI
expires, the surviving systems must ensure that the dead system is isolated
from shared resources (to avoid data corruption).  Once the system is
safely isolated, removal is complete.
        To isolate the system, one or more of the following will be used:
- Fencing, which requires a parallel sysplex since the fencing commands are
sent via a CF
- BCPii services that allow XCF to do an appropriate reset
- Operator responding "DOWN" (after having done an appropriate reset)

Mark A. Brooks
z/OS Sysplex design and development
845-435-5149   T/L 8-295-5149
Poughkeepsie, NY
[email protected]
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to