Hi, monit runs the start/stop programs in a sandbox and strips all environment variables (for security reasons) - it sets only the spartan "PATH=/bin:/usr/bin:/sbin:/usr/sbin" variable. If your script depends on some env. variable, it may fail when executed by monit.
You should find more details in the monit log and/or your start log (/home/deployer/apps/au/current/log/sidekiq.log). The reason why monitoring is disabled is most probably the following statement (you can verify it in the monit log): --8<-- if 3 restarts within 18 cycles then timeout --8<-- => if the restart failed 3 times in 18 cycles, monit will disable the monitoring of this service (timeout) Regards, Martin On Apr 18, 2013, at 2:59 PM, Niels Kristian Schjødt <[email protected]> wrote: > Hi, I have monit setup for monitoring some background workers in my rails > project. Whenever I deploy new code, monit should take care of restarting > them after they are shut down. But every time I deploy, monit only starts up > half of them. If I ssh into the server though and run "sudo monit validate" > then it correctly sees that they are not running, and spins them up. But if I > don't run that command manually, then nothing happens. What could be wrong, I > have no idea how to debug it further? > > Here is my configs: > > ############################## Monitrc ################################# > set daemon 10 > > set logfile /var/log/monit.log > set idfile /var/lib/monit/id > set statefile /var/lib/monit/state > > set eventqueue > basedir /var/lib/monit/events > slots 100 > > set eventqueue basedir /var/monit/ slots 1000 > set mmonit http://monit:[email protected]:8080/collector > set httpd port 2812 and use address 192.168.0.3 > allow localhost > allow 192.168.0.1 > allow user:password > > check system master-worker-server > if loadavg(5min) > 4 for 60 cycles then alert > if memory > 75% for 60 cycles then alert > if cpu(user) > 75% for 60 cycles then alert > > include /etc/monit/conf.d/* > > ############################# /etc/monit/conf.d/sidekiq.conf > ###################################### > > # Check for Ruby sidekiq worker process > > check process da_workers-0.pid with pidfile > /home/deployer/apps/au/shared/pids/workers/da_workers-0.pid > start program = "/bin/bash -l -c 'HOME=/home/deployer > PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && cd > /home/deployer/apps/au/current ; nohup bundle exec sidekiq -e production -C > /home/deployer/apps/au/shared/config/workers/da_workers.yml -i 0 -P > /home/deployer/apps/au/shared/pids/workers/da_workers-0.pid >> > /home/deployer/apps/au/current/log/sidekiq.log 2>&1 &'" as uid deployer and > gid deployer with timeout 250 seconds > stop program = "/bin/bash -l -c 'HOME=/home/deployer > PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && if [ > -d /home/deployer/apps/au/current ] && [ -f > /home/deployer/apps/au/shared/pids/workers/da_workers-0.pid ] && kill -0 `cat > /home/deployer/apps/au/shared/pids/workers/da_workers-0.pid`> /dev/null 2>&1; > then cd /home/deployer/apps/au/current && bundle exec sidekiqctl stop > /home/deployer/apps/au/shared/pids/workers/da_workers-0.pid 3 ; else echo > 'Sidekiq is not running' ; fi'" as uid deployer and gid deployer with timeout > 120 seconds > if cpu usage > 50% for 18 cycles then restart > if mem > 1200.0 MB for 18 cycles then restart > if 3 restarts within 18 cycles then timeout > > check process da_data_maintenance_workers-0.pid with pidfile > /home/deployer/apps/au/shared/pids/workers/da_data_maintenance_workers-0.pid > start program = "/bin/bash -l -c 'HOME=/home/deployer > PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && cd > /home/deployer/apps/au/current ; nohup bundle exec sidekiq -e production -C > /home/deployer/apps/au/shared/config/workers/da_data_maintenance_workers.yml > -i 0 -P > /home/deployer/apps/au/shared/pids/workers/da_data_maintenance_workers-0.pid > >> /home/deployer/apps/au/current/log/sidekiq.log 2>&1 &'" as uid deployer > and gid deployer with timeout 250 seconds > stop program = "/bin/bash -l -c 'HOME=/home/deployer > PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && if [ > -d /home/deployer/apps/au/current ] && [ -f > /home/deployer/apps/au/shared/pids/workers/da_data_maintenance_workers-0.pid > ] && kill -0 `cat > /home/deployer/apps/au/shared/pids/workers/da_data_maintenance_workers-0.pid`> > /dev/null 2>&1; then cd /home/deployer/apps/au/current && bundle exec > sidekiqctl stop > /home/deployer/apps/au/shared/pids/workers/da_data_maintenance_workers-0.pid > 3 ; else echo 'Sidekiq is not running' ; fi'" as uid deployer and gid > deployer with timeout 120 seconds > if cpu usage > 50% for 18 cycles then restart > if mem > 1200.0 MB for 18 cycles then restart > if 3 restarts within 18 cycles then timeout > > check process da_data_collecting_workers-0.pid with pidfile > /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-0.pid > start program = "/bin/bash -l -c 'HOME=/home/deployer > PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && cd > /home/deployer/apps/au/current ; nohup bundle exec sidekiq -e production -C > /home/deployer/apps/au/shared/config/workers/da_data_collecting_workers.yml > -i 0 -P > /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-0.pid > >> /home/deployer/apps/au/current/log/sidekiq.log 2>&1 &'" as uid deployer > and gid deployer with timeout 250 seconds > stop program = "/bin/bash -l -c 'HOME=/home/deployer > PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && if [ > -d /home/deployer/apps/au/current ] && [ -f > /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-0.pid ] > && kill -0 `cat > /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-0.pid`> > /dev/null 2>&1; then cd /home/deployer/apps/au/current && bundle exec > sidekiqctl stop > /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-0.pid 3 > ; else echo 'Sidekiq is not running' ; fi'" as uid deployer and gid deployer > with timeout 120 seconds > if cpu usage > 50% for 18 cycles then restart > if mem > 1200.0 MB for 18 cycles then restart > if 3 restarts within 18 cycles then timeout > > check process da_data_collecting_workers-1.pid with pidfile > /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-1.pid > start program = "/bin/bash -l -c 'HOME=/home/deployer > PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && cd > /home/deployer/apps/au/current ; nohup bundle exec sidekiq -e production -C > /home/deployer/apps/au/shared/config/workers/da_data_collecting_workers.yml > -i 1 -P > /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-1.pid > >> /home/deployer/apps/au/current/log/sidekiq.log 2>&1 &'" as uid deployer > and gid deployer with timeout 250 seconds > stop program = "/bin/bash -l -c 'HOME=/home/deployer > PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH RAILS_ENV=production && if [ > -d /home/deployer/apps/au/current ] && [ -f > /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-1.pid ] > && kill -0 `cat > /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-1.pid`> > /dev/null 2>&1; then cd /home/deployer/apps/au/current && bundle exec > sidekiqctl stop > /home/deployer/apps/au/shared/pids/workers/da_data_collecting_workers-1.pid 3 > ; else echo 'Sidekiq is not running' ; fi'" as uid deployer and gid deployer > with timeout 120 seconds > if cpu usage > 50% for 18 cycles then restart > if mem > 1200.0 MB for 18 cycles then restart > if 3 restarts within 18 cycles then timeout > > > -- > To unsubscribe: > https://lists.nongnu.org/mailman/listinfo/monit-general -- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
