[monit-dev] "Execution failed"

Brian Candler Tue, 27 May 2008 01:29:33 -0700

I'm running monit 5.0_beta1, and just thought I'd report an anomoly I've now
seen several times.


Sometimes when processes start the Status is reported as "Execution failed"
(in red in the web interface), and just sticks like that. However the
process is running just fine, and its pid file is there. It's also in
status "monitored".

Here's an example from one system right now:

# monit status
...
Process 'radsub_subscriber'
  status                            Execution failed
  monitoring status                 monitored
  pid                               3608
  parent pid                        1
  uptime                            1d 12h 48m
  childrens                         0
  memory kilobytes                  18376
  memory kilobytes total            18376
  memory percent                    1.7%
  memory percent total              1.7%
  cpu percent                       2.6%
  cpu percent total                 2.6%
  data collected                    Tue May 27 09:17:36 2008
...

# ps auxwww | grep radsub | grep -v grep
root      3608  2.7  1.7 22508 18376 ?       S    May25  59:43 /usr/bin/ruby 
bin/radsub.rb /u/apps/radsub/shared/log/radacct
# cat /etc/monit.d/radsub.monitrc
check process radsub_subscriber
  with pidfile /u/apps/radsub/shared/pids/subscriber.pid
  start program = "/bin/sh -c 'echo $$ > 
/u/apps/radsub/shared/pids/subscriber.pid;
                   cd /u/apps/radsub/current;
                   exec /usr/bin/ruby bin/radsub.rb 
/u/apps/radsub/shared/log/radacct 2>>/u/apps/radsub/shared/log/radsub.log'"
  stop program  = "/bin/sh -c 'kill `cat 
/u/apps/radsub/shared/pids/subscriber.pid`'"
  if totalmem is greater than 30.0 MB for 4 cycles then restart
  if totalcpu is greater than 30% for 4 cycles then restart
  if 10 restarts within 10 cycles then timeout
  group radsub
# cat /u/apps/radsub/shared/pids/subscriber.pid
3608

On a different system, it's apache which is in this state:

# monit status
...
Process 'apache'
  status                            Execution failed
  monitoring status                 monitored
  pid                               2784
  parent pid                        1
  uptime                            2d 5h 19m
  childrens                         8
  memory kilobytes                  4184
  memory kilobytes total            35252
  memory percent                    0.4%
  memory percent total              3.4%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                0.052s to localhost:443 [HTTP via TCPSSL]
  port response time                0.002s to localhost:80 [HTTP via TCP]
  data collected                    Tue May 27 09:22:09 2008

File 'httpd.conf'
  status                            accessible
  monitoring status                 monitored
  permission                        644
  uid                               0
  gid                               0
  timestamp                         Mon Apr 28 14:53:22 2008
  size                              34742 B
  checksum                          71ef1c79f56dfcf96a02497b7bc3590c(MD5)
  data collected                    Tue May 27 09:22:09 2008

Directory 'httpd.conf.d'
  status                            accessible
  monitoring status                 monitored
  permission                        755
  uid                               0
  gid                               0
  timestamp                         Fri May  9 15:14:04 2008
  data collected                    Tue May 27 09:22:09 2008
...

# ps auxwww | grep 2784 | grep -v grep
root      2784  0.0  0.4  9452 4184 ?        Ss   May14   0:11 /usr/sbin/httpd
# cat /etc/monit.d/apache.monitrc
check process apache
  with pidfile "/var/run/httpd.pid"
  start program = "/etc/init.d/httpd start"
  stop program = "/etc/init.d/httpd stop"
  if 2 restarts within 3 cycles then timeout
  if totalmem > 150 Mb then alert
  if children > 255 for 5 cycles then stop
  if totalcpu usage > 95% for 3 cycles then restart
  if failed port 80 protocol http then restart
  if failed port 443 type TCPSSL proto http then restart
  group server
  depends on httpd.conf, httpd.conf.d

check file httpd.conf
  with path /etc/httpd/conf/httpd.conf
  # Reload apache if the httpd.conf file was changed
  if changed checksum
    then exec "/etc/init.d/httpd graceful"

check directory httpd.conf.d
  with path /etc/httpd/conf.d
  if changed timestamp
    then exec "/etc/init.d/httpd graceful"
# cat /var/run/httpd.pid
2784

However the first system is also running apache, with an identical monit
configuration. On that system, apache's status is "running", as I'd expect.
Therefore this is an intermittent problem, only getting stuck in this state
occasionally.

Has this issue been observed before? If not, is there anything I can do to
help track it down?

Thanks,

Brian.


_______________________________________________
monit-dev mailing list
monit-dev@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monit-dev

[monit-dev] "Execution failed"

Reply via email to