On Tuesday, August 10, 2010, David Noriega wrote: > So your script resets the server so there is no fail-over(ie the other > server takes over resources from that server?) or there is failover > but you then manually return resources back to the server that was > reset?
Our ddn ipmi stonith script (external/ipmi_ddn in heartbeat/pacemaker stonith terms) only makes absolutely sure the node was really reset. If something fails, an error code is reported to pacemaker and then pacemaker (*) will not initiate resource fail-over in order to prevent split-brain. As Lustre devices use MMP (multiple-mount protection) that is not strictly required, in principal. But if something goes wrong. e.g. MMP was accidentally not enabled, a double mount could come up and that would cause serious filesystem and data corruption... Cheers, Bernd PS: (*) hearbeat-v1 (and v2/v3 if not in xml/crm mode) also *should* accept stonith error codes, but in general, I have seen it more than once that hearbeat-v1 run into split-brain and started resources on both cluster nodes. That is something where pacemaker does a much better job. -- Bernd Schubert DataDirect Networks _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
