Fencing devices that do not reboot a node, but just cut off storage have always required the impractical step of re-enabling storage access after the node has been reset. We've never provided a mechanism to automate this unfencing.
Below is an outline of how we might automate unfencing with some simple extensions to the existing fencing library, config scheme and agents. It does not involve the fencing daemon (fenced). Nodes would unfence themselves when they start up. We might also consider a scheme where a node is unfenced by *other* nodes when it starts up, if that has any advantage over self-unfencing. cluster3 is the context, but a similar thing would apply to a next generation unified fencing system, e.g. https://www.redhat.com/archives/cluster-devel/2008-October/msg00005.html init.d/cman would run: cman_tool join fence_node -U <ourname> qdiskd groupd fenced dlm_controld gfs_controld fence_tool join The new step fence_node -U <name> would call libfence:fence_node_undo(name). [fence_node <name> currently calls libfence:fence_node(name) to fence a node.] libfence:fence_node_undo(node_name) logic: for each device_name under given node_name, if an unfencedevice exists with name=device_name, then run the unfencedevice agent with first arg of "undo" and other args the normal combination of node and device args (any agent used with unfencing must recognize/support "undo") [logic derived from cluster.conf structure and similar to fence_node logic] Example 1: <clusternode name="foo" nodeid="3"> <fence> <method="1"> <device name="san" node="foo"/> </method> </fence> </clusternode> <fencedevices> <fencedevice name="san" agent="fence_scsi"/> </fencedevices> <unfencedevices> <unfencedevice name="san" agent="fence_scsi"/> </unfencedevices> fence_node_undo("foo") would: - fork fence_scsi - pass arg string: undo node="foo" agent="fence_scsi" [Note: we've talked about fence_scsi getting a device list from /etc/cluster/fence_scsi.conf instead of from clvm. It would require more user configuration, but would create fewer problems and should be more robust.] Example 2: <clusternode name="bar" nodeid="4"> <fence> <method="1"> <device name="switch1" port="4"/> <device name="switch2" port="6"/> </method> <method="2"> <device name="apc" port="4"/> </method> </fence> </clusternode> <fencedevices> <fencedevice name="switch1" agent="fence_brocade" ipaddr="1.1.1.1"/> <fencedevice name="switch2" agent="fence_brocade" ipaddr="2.2.2.2"/> <fencedevice name="apc" agent="fence_apc" ipaddr="3.3.3.3"/> </fencedevices> <unfencedevices> <unfencedevice name="switch1" agent="fence_brocade" ipaddr="1.1.1.1"/> <unfencedevice name="switch2" agent="fence_brocade" ipaddr="2.2.2.2"/> </unfencedevices> fence_node_undo("bar") would: - fork fence_brocade - pass arg string: undo port="4" agent="fence_brocade" ipaddr="1.1.1.1" - fork fence_brocade - pass arg string: undo port="6" agent="fence_brocade" ipaddr="2.2.2.2" - ignore device "apc" because it's not found under <unfencedevices>
