On Tuesday, August 10, 2010, Kevin Van Maren wrote: > Depends on the HA package you are using. Heartbeat comes with a script > that supports IPMI. >
For our installations we even use a modified external/ipmi_ddn stonith script that does uses power-off/status/on to make sure the system is really reset. The heartbeat/pacemaker script uses the ipmi reset method by default, but ipmi commands are not required by specs to succeed. So ipmitool (used by external/ipmi) might successfully return, but does in way ensure the node was really reset. I have seen that rather often in real life already. The default script also supports the power-off/on method, but also does not check for the status. So our modified script first powers off, then checks if the node is really offline, then powers on again and only then successfully returns. Unfortunately, that is at the cost of an increased fail-over time, as power- off and then power-on needs some minimal downtime in between (ca. 30s) and heartbeats/pacemaker stonith does not support async events (power-off would be sufficient, but once stonith successfully returns, it is not called again till the next fencing). -- Bernd Schubert DataDirect Networks _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
