On Mon, 2009-02-23 at 12:15 -0600, David Teigland wrote: > On Mon, Feb 23, 2009 at 07:27:20AM +0100, Fabio M. Di Nitto wrote: > > > libfence:fence_node_undo(node_name) logic: > > > for each device_name under given node_name, > > > if an unfencedevice exists with name=device_name, then > > > run the unfencedevice agent with first arg of "undo" > > > and other args the normal combination of node and device args > > > (any agent used with unfencing must recognize/support "undo") > > > > All our agents already support on/off enable/disable operations. It's > > probably best to align them to have the same config options rather than > > adding a new one across the board. > > Yes, I have those options in mind, and would prefer to use them as well. > We'll have to wait and see during the implementation phase; for the time being > they complicate things, so I'm using "undo" to avoid those details. >
I know Marek is about to start a "matrix" to map fence agents features and options. It might be a good thing to talk to him soon'ish. We were discussing it only a few hours ago. > The meanings of those fencing structures have never changed since being > introduced many years ago, and both of those fundamentally change it. It > would be very unfortunate to redefine them. I agree. it's a good point. > > A good alternative to <unfencedevices> would be an <unfence> section within > the node setions (it would not require a method level).... Now that I've > thought more about it, it seems a better choice than "unfencedevices". It > defines explicitly what should be done, rather than depending on the implicit > effects of matching names between fencedevice/unfencedevice. Agreed. > > <clusternode name="foo" nodeid="3"> > <fence> > <method="1"> > <device name="san" node="foo"/> > </method> > </fence> > > <unfence> > <device name="san" node="foo"/> > </unfence> > </clusternode> > > <fencedevices> > <fencedevice name="san" agent="fence_scsi"/> > </fencedevices> > > and > > <clusternode name="bar" nodeid="4"> > <fence> > <method="1"> > <device name="switch1" port="4"/> > <device name="switch2" port="6"/> > </method> > <method="2"> > <device name="apc" port="4"/> > </method> > </fence> > > <unfence> > <device name="switch1" port="4"/> > <device name="switch1" port="6"/> > </unfence> > </clusternode> > > <fencedevices> > <fencedevice name="switch1" agent="fence_brocade" ipaddr="1.1.1.1"/> > <fencedevice name="switch2" agent="fence_brocade" ipaddr="2.2.2.2"/> > <fencedevice name="apc" agent="fence_apc" ipaddr="3.3.3.3"/> > </fencedevices> > > The key thing I've realized since the previous attempt in 2004, is that we > need to explicitly configure what unfencing should happen, rather than just > trying to apply the normal fencing config in reverse. I think I was trying to apply this same logic and stalled at some point in the apc+brocade example. With more than one fence agent the amount of combinations to achieve fencing and then safely unfence node simply grows exponentially.. Given this last example, a reasonable unfence operation would be to try to poweron via apc too. There is no guarantee that it was only method="1" fencing the node and the node could be powered off. if we succeed in enabling the switch port, we still don't guarantee that the node will come back because of lack of power.. How do we protect a node that failed to be fenced, from being unfenced? Example 2: both method="1" and method="2" fail to fence node X. At this point any unfence operation is extremely dangerous. Fabio
