Hello!

I'm using Monit to monitor some processes, and can't seem to get my simple
configuration working correctly.
When my threshold is met, I end up getting sent constant "failed to stop"
messages.

Here is the output in my logs:
---------------------------------------------------------------------------------------------------------------------------------------------------------
monit[4823]: 'thin8007' total mem amount of 205988kB matches resource limit
[total mem amount>163840kB]
monit[4823]: 'thin8007' trying to restart
monit[4823]: 'thin8007' stop: /usr/bin/kill
monit[4823]: 'thin8007' failed to stop
---------------------------------------------------------------------------------------------------------------------------------------------------------

Here is my configuration:
---------------------------------------------------------------------------------------------------------------------------------------------------------
set daemon  20
set logfile syslog facility log_daemon
  check process thin8007 with pidfile /shared/pids/thin.8007.pid
  start program = "/usr/bin/thin start -C /etc/thin/application.yml --only
8000"
  stop program  = "/usr/bin/kill -9 `cat /shared/pids/thin.8007.pid` && rm
-f /shared/pids/thin.8007.pid"
  if totalmem > 160.0 MB for 1 cycles then restart
  if cpu > 90% for 1 cycles then restart
  group thin
---------------------------------------------------------------------------------------------------------------------------------------------------------

As you can see, the "stop" directive is a bit of a brute force method.
 Prior to using that, I was using the "stop" command
of the application (thin) I'm trying to monitor.  I ran into a problem when
the application wouldn't clean up after itself, and
it would end up leaving stale pid files around.  So, I decided to SIGKILL
the process and clean up the pid manually.

If I run the stop command manually, the process is killed and the pid file
is gone.  However, when it is run through Monit, I
get the "failed to stop" message.  Monit is run as root on this system, but,
it still seems like it could be a permissions issue?
Is there anyway to get more verbose output in regard to why it "failed to
stop"?  Is there anything that Monit could glean from
the output of the system calls it makes?  I'd be happy to patch if that was
a possibility!

Any suggestions would be welcome!
Thanks!
==
Dylan
--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to