Hello! I'm using Monit to monitor some processes, and can't seem to get my simple configuration working correctly. When my threshold is met, I end up getting sent constant "failed to stop" messages.
Here is the output in my logs: --------------------------------------------------------------------------------------------------------------------------------------------------------- monit[4823]: 'thin8007' total mem amount of 205988kB matches resource limit [total mem amount>163840kB] monit[4823]: 'thin8007' trying to restart monit[4823]: 'thin8007' stop: /usr/bin/kill monit[4823]: 'thin8007' failed to stop --------------------------------------------------------------------------------------------------------------------------------------------------------- Here is my configuration: --------------------------------------------------------------------------------------------------------------------------------------------------------- set daemon 20 set logfile syslog facility log_daemon check process thin8007 with pidfile /shared/pids/thin.8007.pid start program = "/usr/bin/thin start -C /etc/thin/application.yml --only 8000" stop program = "/usr/bin/kill -9 `cat /shared/pids/thin.8007.pid` && rm -f /shared/pids/thin.8007.pid" if totalmem > 160.0 MB for 1 cycles then restart if cpu > 90% for 1 cycles then restart group thin --------------------------------------------------------------------------------------------------------------------------------------------------------- As you can see, the "stop" directive is a bit of a brute force method. Prior to using that, I was using the "stop" command of the application (thin) I'm trying to monitor. I ran into a problem when the application wouldn't clean up after itself, and it would end up leaving stale pid files around. So, I decided to SIGKILL the process and clean up the pid manually. If I run the stop command manually, the process is killed and the pid file is gone. However, when it is run through Monit, I get the "failed to stop" message. Monit is run as root on this system, but, it still seems like it could be a permissions issue? Is there anyway to get more verbose output in regard to why it "failed to stop"? Is there anything that Monit could glean from the output of the system calls it makes? I'd be happy to patch if that was a possibility! Any suggestions would be welcome! Thanks! == Dylan
-- To unsubscribe: http://lists.nongnu.org/mailman/listinfo/monit-general
