Is the fix for 5.1.1 in the latest monit version? If so, we will upgrade to latest.
On Wed, Feb 24, 2010 at 6:56 PM, Martin Pala <[email protected]> wrote: > Thanks for data. > > It seems to me that the problem could be that service start was requested > before the service managed to stop and the start flag was reset after stop => > the service stayed in unmonitored mode (as result of stop). To confirm this > there should be however additional log about stop program result (depending > on result either "stopped" or "failed to stop"): > > 1.) are you able to reproduce the issue on will? > > 2.) please upgrade to monit-5.1.1 ... there is following fix which could play > role as it seems that the pending stop was woke up by start > --8<-- > * Fixed #27784: wait_start/wait_stop can advance too quickly. > Thanks to Randy Puro for report. > --8<-- > > (you can get monit-5.1.1 here: > http://www.mmonit.com/monit/dist/monit-5.1.1.tar.gz) > > > ... i'll try to replicate the problem in parallel > > Best regards, > Martin > > > On Feb 24, 2010, at 4:20 PM, David Bristow wrote: > >> Here is a copy of the configuration for backgroundrb: >> >> check process backgroundrb with pidfile >> /home/rails/ideeli/qa/current/tmp/pids/backgroundrb_8888.pid >> group backgroundrb >> start program = "/usr/local/bin/backgroundrb_wrapper start qa >> /home/rails/ideeli/qa/current/tmp/pids/backgroundrb_8888.pid" with >> timeout 40 seconds >> stop program = "/usr/local/bin/backgroundrb_wrapper stop qa >> /home/rails/ideeli/qa/current/tmp/pids/backgroundrb_8888.pid" with >> timeout 60 seconds >> if memory > 240 Mb then restart >> >> There are no more interesting things in the logs at around this time. >> Nothing related to backgroundrb, at least. >> >> On Mon, Feb 22, 2010 at 4:56 PM, Martin Pala <[email protected]> wrote: >>> Hi David, >>> >>> the service is unmonitored on stop ... the service start enables monitoring >>> again, so it's not expected to see unmonitored service after start. >>> >>> It seems to me that your 'backgroundrb' service has no "start program = >>> ..." in your monit config file. If the "start program" would be defined, it >>> should log similar message to "'backgroundrb' stop: >>> /usr/local/bin/backgroundrb_wrapper", but with "start" word instead of >>> "stop". The message is missing in the log so it was logged either past >>> 11:45:32 (which is likely of start is defined) or start program is not >>> defined and thus service was not started - check maybe timed out (don't >>> know your configuration so i cannot say) ... or maybe somebody stopped it >>> again. >>> >>> Please can you provide full monit configuration for 'backgroundrb' service >>> and rest of debug log between 11:44:48 and 12:08:33? >>> >>> Are you able to reproduce the issue on will? I tried to replicate the >>> problem but it works fine for me. >>> >>> Best regards, >>> Martin >>> >>> >>> >>> On Feb 22, 2010, at 3:02 PM, David Bristow wrote: >>> >>>> We are having trouble with certain services managed by monit that do >>>> not restart as they should after being shut down and then started up >>>> again. >>>> >>>> For example, we use backgroundrb. Someone shut it down for updating, >>>> and started it up afterwards. Here is a sample section of the >>>> monit.log that shows what was happening at the time: >>>> >>>> [EST Feb 19 11:44:48] debug : stop service 'backgroundrb' on user >>>> request >>>> [EST Feb 19 11:44:48] info : monit daemon at 19023 awakened >>>> [EST Feb 19 11:45:10] error : 'syslog-ng' failed to start >>>> [EST Feb 19 11:45:10] info : 'backgroundrb' stop: >>>> /usr/local/bin/backgroundrb_wrapper >>>> [EST Feb 19 11:45:19] debug : start service 'backgroundrb' on user >>>> request >>>> [EST Feb 19 11:45:19] info : monit daemon at 19023 awakened >>>> [EST Feb 19 11:45:31] info : 'backgroundrb' start action done >>>> [EST Feb 19 11:45:32] info : Awakened by User defined signal 1 >>>> >>>> And at 12:09AM, this is the "monit status" for backgroundrb: >>>> >>>> Process 'backgroundrb' >>>> status not monitored >>>> monitoring status not monitored >>>> data collected Fri Feb 19 12:08:33 2010 >>>> >>>> Why does this happen? We are using monit 5.0.3. >>>> >>>> -- >>>> David Bristow <[email protected]> >>>> >>>> >>>> -- >>>> To unsubscribe: >>>> http://lists.nongnu.org/mailman/listinfo/monit-general >>> >>> >>> >>> -- >>> To unsubscribe: >>> http://lists.nongnu.org/mailman/listinfo/monit-general >>> >> >> >> >> -- >> David Bristow <[email protected]> >> >> >> -- >> To unsubscribe: >> http://lists.nongnu.org/mailman/listinfo/monit-general > > > > -- > To unsubscribe: > http://lists.nongnu.org/mailman/listinfo/monit-general > -- David Bristow <[email protected]> -- To unsubscribe: http://lists.nongnu.org/mailman/listinfo/monit-general
