Hi,

We moved away from daemontools to monit very recently and we are quite happy
about monit, however, we ran into something quite difficult to explain and
we would like to have some feedback from the community on this topic (to see
if we didn't miss something here).

We first came up with the following monit configuration:

check process @project.name@[email protected]@.process with pidfile
/var/run/@project.name@[email protected]@.pid
      start program = "/etc/init.d/@project.name@[email protected]@start"
      stop program = "/etc/init.d/@project.name@[email protected]@ stop"
      if not exist for 2 cycles then restart
      if not exist for 5 cycles then alert
      if failed host 127.0.0.1 port @service.port@ for 5 cycles then alert
      if cpu > 75% for 5 cycles then alert
      group @project.name@[email protected]@

check file @project.name@[email protected]@.log with path /var/log/@
project.name@[email protected]@/current
      if not match "Kumiho startup complete." for 5 cycles then alert
      depends on @project.name@[email protected]@.process
      group @project.name@[email protected]@

With this configuration, while monit was able to detect (and notify us) that
a service was down, it somehow failed to restart it. Growing despair to have
something working lead us to try commenting some lines out. This then lead
us to come to conclusion that - even if we don't know why, having the
"restart" statement in the last position helped.

check process @project.name@[email protected]@.process with pidfile
/var/run/@project.name@[email protected]@.pid
      start program = "/etc/init.d/@project.name@[email protected]@start"
      stop program = "/etc/init.d/@project.name@[email protected]@ stop"
      if not exist for 5 cycles then alert
      if cpu > 75% for 5 cycles then alert
      if failed host 127.0.0.1 port @service.port@ for 5 cycles then alert
      if not exist for 2 cycles then restart
      group @project.name@[email protected]@

To have the restart also working for the logfile part, we also had to
duplicate information this way:

check file @project.name@[email protected]@.log with path /var/log/@
project.name@[email protected]@/current
      start program = "/etc/init.d/@project.name@[email protected]@start"
      stop program = "/etc/init.d/@project.name@[email protected]@ stop"
      if not exist for 5 cycles then alert
      if size > 100 MB for 5 cycles then alert
      if not exist for 2 cycles then restart
      depends on @project.name@[email protected]@.process
      group @project.name@[email protected]@

(Please note that we do use the @...@ values in our configuration, those are
just placeholder for clarity purposes).

Ok, the questions is : did we miss something in how information should be
ordered or not ? Is there a way for us to reduce the code duplication here
?

Thanks for your feed backs ! :)

-- 
Romain PELISSE,
*"The trouble with having an open mind, of course, is that people will
insist on coming along and trying to put things in it" -- Terry Pratchett*
http://belaran.eu/wordpress/belaran
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to