I'm running monit 5.0_beta1, and just thought I'd report an anomoly I've now seen several times.
Sometimes when processes start the Status is reported as "Execution failed" (in red in the web interface), and just sticks like that. However the process is running just fine, and its pid file is there. It's also in status "monitored". Here's an example from one system right now: # monit status ... Process 'radsub_subscriber' status Execution failed monitoring status monitored pid 3608 parent pid 1 uptime 1d 12h 48m childrens 0 memory kilobytes 18376 memory kilobytes total 18376 memory percent 1.7% memory percent total 1.7% cpu percent 2.6% cpu percent total 2.6% data collected Tue May 27 09:17:36 2008 ... # ps auxwww | grep radsub | grep -v grep root 3608 2.7 1.7 22508 18376 ? S May25 59:43 /usr/bin/ruby bin/radsub.rb /u/apps/radsub/shared/log/radacct # cat /etc/monit.d/radsub.monitrc check process radsub_subscriber with pidfile /u/apps/radsub/shared/pids/subscriber.pid start program = "/bin/sh -c 'echo $$ > /u/apps/radsub/shared/pids/subscriber.pid; cd /u/apps/radsub/current; exec /usr/bin/ruby bin/radsub.rb /u/apps/radsub/shared/log/radacct 2>>/u/apps/radsub/shared/log/radsub.log'" stop program = "/bin/sh -c 'kill `cat /u/apps/radsub/shared/pids/subscriber.pid`'" if totalmem is greater than 30.0 MB for 4 cycles then restart if totalcpu is greater than 30% for 4 cycles then restart if 10 restarts within 10 cycles then timeout group radsub # cat /u/apps/radsub/shared/pids/subscriber.pid 3608 On a different system, it's apache which is in this state: # monit status ... Process 'apache' status Execution failed monitoring status monitored pid 2784 parent pid 1 uptime 2d 5h 19m childrens 8 memory kilobytes 4184 memory kilobytes total 35252 memory percent 0.4% memory percent total 3.4% cpu percent 0.0% cpu percent total 0.0% port response time 0.052s to localhost:443 [HTTP via TCPSSL] port response time 0.002s to localhost:80 [HTTP via TCP] data collected Tue May 27 09:22:09 2008 File 'httpd.conf' status accessible monitoring status monitored permission 644 uid 0 gid 0 timestamp Mon Apr 28 14:53:22 2008 size 34742 B checksum 71ef1c79f56dfcf96a02497b7bc3590c(MD5) data collected Tue May 27 09:22:09 2008 Directory 'httpd.conf.d' status accessible monitoring status monitored permission 755 uid 0 gid 0 timestamp Fri May 9 15:14:04 2008 data collected Tue May 27 09:22:09 2008 ... # ps auxwww | grep 2784 | grep -v grep root 2784 0.0 0.4 9452 4184 ? Ss May14 0:11 /usr/sbin/httpd # cat /etc/monit.d/apache.monitrc check process apache with pidfile "/var/run/httpd.pid" start program = "/etc/init.d/httpd start" stop program = "/etc/init.d/httpd stop" if 2 restarts within 3 cycles then timeout if totalmem > 150 Mb then alert if children > 255 for 5 cycles then stop if totalcpu usage > 95% for 3 cycles then restart if failed port 80 protocol http then restart if failed port 443 type TCPSSL proto http then restart group server depends on httpd.conf, httpd.conf.d check file httpd.conf with path /etc/httpd/conf/httpd.conf # Reload apache if the httpd.conf file was changed if changed checksum then exec "/etc/init.d/httpd graceful" check directory httpd.conf.d with path /etc/httpd/conf.d if changed timestamp then exec "/etc/init.d/httpd graceful" # cat /var/run/httpd.pid 2784 However the first system is also running apache, with an identical monit configuration. On that system, apache's status is "running", as I'd expect. Therefore this is an intermittent problem, only getting stuck in this state occasionally. Has this issue been observed before? If not, is there anything I can do to help track it down? Thanks, Brian. _______________________________________________ monit-dev mailing list monit-dev@nongnu.org http://lists.nongnu.org/mailman/listinfo/monit-dev