Hi folks, We've run into a weird situation, where our SAN does "something" during the nightly backups causing our file systems to stay in "quiesced" mode forever (the same thing as fsfreeze -f). It happened after an update of the SAN firmware, so right now the storage folks are working with the vendor.
In the meantime we have had some random server freeze situations. So far the best I can manage is a whole bunch of Monit checks where all servers are touching a file on all the other servers. It seems like a lot of overkill but in a fs freeze type situation I will still get an email. check program <server>-sanity with path "/usr/bin/ssh <server> touch .sanity" as uid "account" as gid "account" timeout 1 second alert [email protected] not on { instance, action } with reminder on 30 cycles if status != 0 then alert Does anybody recommend a simpler solution? This just seems like a little bit much to see whether the file system got stuck. The only problem with a "self-check" situation is that postfix won't send an email when the file system is stuck like this, so it appears that I need something external. Thanks for any advice. V/r, Bryan -- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
