[Cluster-devel] unfencing

David Teigland Fri, 20 Feb 2009 13:46:36 -0800

Fencing devices that do not reboot a node, but just cut off storage have
always required the impractical step of re-enabling storage access after the
node has been reset.  We've never provided a mechanism to automate this
unfencing.


Below is an outline of how we might automate unfencing with some simple
extensions to the existing fencing library, config scheme and agents.  It does
not involve the fencing daemon (fenced).  Nodes would unfence themselves when
they start up.  We might also consider a scheme where a node is unfenced by
*other* nodes when it starts up, if that has any advantage over
self-unfencing.

cluster3 is the context, but a similar thing would apply to a next generation
unified fencing system, e.g.
https://www.redhat.com/archives/cluster-devel/2008-October/msg00005.html

init.d/cman would run:
        cman_tool join
        fence_node -U <ourname>
        qdiskd
        groupd
        fenced
        dlm_controld
        gfs_controld
        fence_tool join

The new step fence_node -U <name> would call libfence:fence_node_undo(name).
[fence_node <name> currently calls libfence:fence_node(name) to fence a node.]

libfence:fence_node_undo(node_name) logic:
        for each device_name under given node_name,
        if an unfencedevice exists with name=device_name, then
        run the unfencedevice agent with first arg of "undo"
        and other args the normal combination of node and device args
        (any agent used with unfencing must recognize/support "undo")

[logic derived from cluster.conf structure and similar to fence_node logic]

Example 1:

<clusternode name="foo" nodeid="3">
        <fence>
        <method="1">
                <device name="san" node="foo"/>
        </method>
        </fence>
</clusternode>

<fencedevices>
        <fencedevice name="san" agent="fence_scsi"/>
</fencedevices>

<unfencedevices>
        <unfencedevice name="san" agent="fence_scsi"/>
</unfencedevices>

fence_node_undo("foo") would:
- fork fence_scsi
- pass arg string: undo node="foo" agent="fence_scsi"

[Note: we've talked about fence_scsi getting a device list from
 /etc/cluster/fence_scsi.conf instead of from clvm.  It would require
 more user configuration, but would create fewer problems and should
 be more robust.]

Example 2:

<clusternode name="bar" nodeid="4">
        <fence>
        <method="1">
                <device name="switch1" port="4"/>
                <device name="switch2" port="6"/>
        </method>
        <method="2">
                <device name="apc" port="4"/>
        </method>
        </fence>
</clusternode>

<fencedevices>
        <fencedevice name="switch1" agent="fence_brocade" ipaddr="1.1.1.1"/>
        <fencedevice name="switch2" agent="fence_brocade" ipaddr="2.2.2.2"/>
        <fencedevice name="apc" agent="fence_apc" ipaddr="3.3.3.3"/>
</fencedevices>

<unfencedevices>
        <unfencedevice name="switch1" agent="fence_brocade" ipaddr="1.1.1.1"/>
        <unfencedevice name="switch2" agent="fence_brocade" ipaddr="2.2.2.2"/>
</unfencedevices>

fence_node_undo("bar") would:
- fork fence_brocade
- pass arg string: undo port="4" agent="fence_brocade" ipaddr="1.1.1.1"
- fork fence_brocade
- pass arg string: undo port="6" agent="fence_brocade" ipaddr="2.2.2.2"
- ignore device "apc" because it's not found under <unfencedevices>

[Cluster-devel] unfencing

Reply via email to