Hi folks,

We've run into a weird situation, where our SAN does "something"
during the nightly backups causing our file systems to stay in
"quiesced" mode forever (the same thing as fsfreeze -f).  It happened
after an update of the SAN firmware, so right now the storage folks
are working with the vendor.

In the meantime we have had some random server freeze situations.  So
far the best I can manage is a whole bunch of Monit checks where all
servers are touching a file on all the other servers.  It seems like a
lot of overkill but in a fs freeze type situation I will still get an
email.

check program <server>-sanity
  with path "/usr/bin/ssh <server> touch .sanity"
  as uid "account" as gid "account" timeout 1 second
  alert [email protected] not on { instance, action }
  with reminder on 30 cycles
  if status != 0 then alert

Does anybody recommend a simpler solution?  This just seems like a
little bit much to see whether the file system got stuck.  The only
problem with a "self-check" situation is that postfix won't send an
email when the file system is stuck like this, so it appears that I
need something external.

Thanks for any advice.

V/r,
Bryan

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to