To add to this, instead of issuing a straight reboot, I prefer running 'pcs stonith fence <node>' which will fail over resources appropriately AND reboot the node (if doable) or otherwise power it off. The advantage to doing it this way is that it keeps Pacemaker in-the-know about the state of the node so it doesn't (usually) shoot it as it's trying to boot back up. When you're doing maintenance on a node without letting Pacemaker know about it, results can be unpredictable.

Cameron

On 3/5/25 2:12 PM, Laura Hild via lustre-discuss wrote:
I'm not sure what to say about how Pacemaker *should* behave, but I *can* say I 
virtually never try to (cleanly) reboot a host from which I have not already 
evacuated all resources, e.g. with `pcs node standby` or by putting Pacemaker 
in maintenance mode and unmounting/exporting everything manually.  If I can't 
evacuate all resources and complete a lustre_rmmod, the host is getting 
power-cycled.

So maybe I can say, my guess would be that in the host's shutdown process, 
stopping the Pacemaker service happens before filesystems are unmounted, and 
that Pacemaker doesn't want to make an assumption whether its own shut-down 
means it should standby or initiate maintenance mode, and therefore the other 
host ends up knowing only that its partner has disappeared, while the 
filesystems have yet to be unmounted.

_______________________________________________
lustre-discuss mailing list
[email protected]
https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!0RFI5fXw0SvxL-3t8fqoESM6EpPmNWAltjI8fbf9DcPG9n25cKHYbYq8Vgvp_9RgVVAzDg8YrfM_xqAwLvKjxP7NqvwdWQ$
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to