There are some cases which I don’t believe can be caught with callbacks (e.g. 
DMS = Dead Man Switch).  But you could possibly use preStartup to check the 
host uptime to make an assumption if GPFS was restarted long after the host 
booted.  You could also peek in /tmp/mmfs and only report if you find something 
there.  That said, the docs say that preStartup fires after the node joins the 
cluster.  So if that means once the node is ‘active’ then you might miss out on 
nodes stuck in ‘arbitrating’ for a while due to a waiter problem.

We run a script with cron which monitors the myriad things which can go wrong 
and attempt to right those which are safe to fix, and raise alerts 
appropriately.  Something like that, outside the reach of GPFS, is often a good 
choice if you don’t need to know something the moment it happens.

Thx
Paul

From: gpfsug-discuss-boun...@spectrumscale.org 
<gpfsug-discuss-boun...@spectrumscale.org> On Behalf Of Oesterlin, Robert
Sent: Wednesday, January 30, 2019 3:52 PM
To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Subject: [gpfsug-discuss] Node ‘crash and restart’ event using GPFS callback?

Anyone crafted a good way to detect a node ‘crash and restart’ event using GPFS 
callbacks? I’m thinking “preShutdown” but I’m not sure if that’s the best. What 
I’m really looking for is did the node shutdown (abort) and create a dump in 
/tmp/mmfs


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to