Hi, We moved away from daemontools to monit very recently and we are quite happy about monit, however, we ran into something quite difficult to explain and we would like to have some feedback from the community on this topic (to see if we didn't miss something here).
We first came up with the following monit configuration: check process @project.name@[email protected]@.process with pidfile /var/run/@project.name@[email protected]@.pid start program = "/etc/init.d/@project.name@[email protected]@start" stop program = "/etc/init.d/@project.name@[email protected]@ stop" if not exist for 2 cycles then restart if not exist for 5 cycles then alert if failed host 127.0.0.1 port @service.port@ for 5 cycles then alert if cpu > 75% for 5 cycles then alert group @project.name@[email protected]@ check file @project.name@[email protected]@.log with path /var/log/@ project.name@[email protected]@/current if not match "Kumiho startup complete." for 5 cycles then alert depends on @project.name@[email protected]@.process group @project.name@[email protected]@ With this configuration, while monit was able to detect (and notify us) that a service was down, it somehow failed to restart it. Growing despair to have something working lead us to try commenting some lines out. This then lead us to come to conclusion that - even if we don't know why, having the "restart" statement in the last position helped. check process @project.name@[email protected]@.process with pidfile /var/run/@project.name@[email protected]@.pid start program = "/etc/init.d/@project.name@[email protected]@start" stop program = "/etc/init.d/@project.name@[email protected]@ stop" if not exist for 5 cycles then alert if cpu > 75% for 5 cycles then alert if failed host 127.0.0.1 port @service.port@ for 5 cycles then alert if not exist for 2 cycles then restart group @project.name@[email protected]@ To have the restart also working for the logfile part, we also had to duplicate information this way: check file @project.name@[email protected]@.log with path /var/log/@ project.name@[email protected]@/current start program = "/etc/init.d/@project.name@[email protected]@start" stop program = "/etc/init.d/@project.name@[email protected]@ stop" if not exist for 5 cycles then alert if size > 100 MB for 5 cycles then alert if not exist for 2 cycles then restart depends on @project.name@[email protected]@.process group @project.name@[email protected]@ (Please note that we do use the @...@ values in our configuration, those are just placeholder for clarity purposes). Ok, the questions is : did we miss something in how information should be ordered or not ? Is there a way for us to reduce the code duplication here ? Thanks for your feed backs ! :) -- Romain PELISSE, *"The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it" -- Terry Pratchett* http://belaran.eu/wordpress/belaran
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
