Perfect! Thanks Martin. On 10/16/07, Martin Pala <[EMAIL PROTECTED]> wrote: > timeout disables the process monitoring and sends alert - the idea is, > that of the service is in error state too long and all/repeated > automatic recovery attempts failed, it makes no sense to try it over and > over and thus it is possible to stop monitoring and alert operator. > > Regarding the kill ... you can use the "exec" action like this: > > if cpu is greater than 80% for 5 cycles then exec "/bin/pkill mongrel" > > or more specific (reusing pid file): > > if cpu is greater than 80% for 5 cycles then exec "/bin/bash -c 'kill > -9 `cat /var/run/mongrel_cluster/mongrel.9006.pid`'" > > Martin > > > Michael Steinfeld wrote: > > So maybe I am a complete idiot... but here is what I have been pondering > > > > Every once in awhile it seems that monit will attempt to restart > > mongrels if it meets the specificied criteria.. CPU to high/long, to > > much RAM .. etc > > > > What happens is monit will attempt to restart mongrels, but the pids > > are not dying. Even if I do, "monit -g group stop all" and wait... > > they don't die. Even attempting to stop the process by itself doesn't > > work. So I have to send a SIGKILL > > > > (I have not been able to figure out what is causing this ) > > > > So.. I was thinking to have monit send a SIGKILL if 5 cycles doesn't > > solve the issue. > > > > #my monit service for mongrels > > check process mongrel_9006 > > with pidfile /var/run/mongrel_cluster/mongrel.9006.pid > > start program = "/usr/local/bin/mongrel_rails cluster::start -C > > /etc/mongrel_cluster/mongrel_cluster.yml --clean --only 9006" > > stop program = "/usr/local/bin/mongrel_rails cluster::stop -C > > /etc/mongrel_cluster/mongrel_cluster.yml --clean --only 9006" > > if totalmem is greater than 110.0 MB for 3 cycles then > > restart # eating up memory? > > if loadavg(5min) greater than 10 for 8 cycles then > > restart # bad, bad, bad > > if cpu is greater than 50% for 2 cycles then > > alert # send an email to admin > > if cpu is greater than 80% for 3 cycles then > > restart > > if 10 restarts within 10 cycles then > > timeout > > > > Instead of .. > > > > <snip> > > if cpu is greater than 50% for 2 cycles then > > alert # send an email to admin > > if cpu is greater than 80% for 3 cycles then > > </snip> > > > > do this ... > > > > <snip> > > if cpu is greater than 50% for 2 cycles then > > alert # complain about it > > if cpu is greater than 80% for 5 cycles then > > sigkill > > sleep 5 # enough time to kill all 8 mongrel pids > > start_fresh > > </snip> > > > > #so it would look like this... you get the idea. > > #my monit service for mongrels > > check process mongrel_9006 > > with pidfile /var/run/mongrel_cluster/mongrel.9006.pid > > start program = "/usr/local/bin/mongrel_rails cluster::start -C > > /etc/mongrel_cluster/mongrel_cluster.yml --clean --only 9006" > > stop program = "/usr/local/bin/mongrel_rails cluster::stop -C > > /etc/mongrel_cluster/mongrel_cluster.yml --clean --only 9006" > > > > kill_the_bastard = "kill -9 <pid>" # hmpf... > > > > if totalmem is greater than 110.0 MB for 3 cycles then > > restart # eating up memory? > > if loadavg(5min) greater than 10 for 8 cycles then > > restart # bad, bad, bad > > if cpu is greater than 50% for 2 cycles then > > alert # complain about it > > > > if cpu is greater than 80% for 5 cycles then > > kill_the_bastard > > # I am assuming that if it is killed, then monit will start it > > > > if 10 restarts within 10 cycles then > > timeout > > > > so question, does 'timeout' actually send a SIGTERM/SIGHUP to the > > proccess, or does it just execute the stop command for that particular > > service? > > > > how are you guys handling stale pids with monit? In the case that > > executing stop/restart doesn't work? > > > > Is what I am suggesting even possible? > > > > > -- > To unsubscribe: > http://lists.nongnu.org/mailman/listinfo/monit-general >
-- Michael Steinfeld Linux Admin/Developer AIM: mikesteinfeld GTALK: [EMAIL PROTECTED] -- To unsubscribe: http://lists.nongnu.org/mailman/listinfo/monit-general
