Nope, this will just increase poll interval for particular service, so service will be restarted twice, but with increased time between restarts :)
Assuming we have some service running with PID=10 (in /tmp/file.pid), and script that checks if process mentioned in /tmp/file.pid have less then 100 threads, if not - return 1. Poll (cycle) #1: - /tmp/script.sh is run against /tmp/file.pid (contains 10) and returns 1, but this value is not collected by monit until next cycle Poll #2: - monit collects status#1, fires event that 'status != 0' - BEFORE processing event /tmp/script.sh is run again (/tmp/file.pid still contains 10) and return value is 1 again, and again it is postponed till next poll period - monit process exec action (because status#1 == 1) and restart service (now /tmp/file.pid will contain 20 for example) Poll #3: - monit collects status#2, fires event that 'status != 0' - but service was already restarted at #2 and this is obsolete value! - before processing event /tmp/script.sh is run against /tmp/file.pid (contains 20) and returns 0 (because it is fresh process) - monit process exec action (because status#2 == 1) and restart service (now /tmp/file.pid will contain 30 for example) Poll #4: - monit collects status#3 and see that it is ok So the problem is that 'check program' result is one step behind than other actions, and at some point in time (poll #3) it uses obsolete information to perform actions. On Mon, Nov 19, 2012 at 5:43 PM, Jan-Henrik Haukeland <[email protected]>wrote: > I'm not sure I understand the problem, but that does not prevent me from > having a suggestion :) I'm wondering if the every statement could help in > this situation? As in: > > check program with path '/tmp/script.sh' > every 2 cycles > if status != 0 then exec '/tmp/some_service.sh restart' > > Any luck with that? > > > On Nov 19, 2012, at 12:12 PM, Dmitry Zamaruev <[email protected]> > wrote: > > > Hi, > > > > I'm using 'check program' to monitor thread leak in one of our > applications. All is working nice, except that application is always > restarted twice. I dig through source code and found that it should be > related to how 'check program' is handled. > > Here is my configuration example: > > > > check program with path '/tmp/script.sh' > > if status != 0 then exec '/tmp/some_service.sh restart' > > > > Here is the workflow I'm seeing: > > > > - Poll period #1: > > - start /tmp/script.sh > > > > - Poll period #2: > > - collect exit code from /tmp/script.sh > > - raise event with status = 1 > > - start /tmp/script.sh <<== problem here, script is run against > service before restart! so it will return status=1 > > - process event - exec '/tmp/some_service.sh restart' > > > > - Poll period #3 > > - collect exit code from /tmp/script.sh > > - raise event with status = 1 > > - start /tmp/script.sh <<== here script is run against fresh service > after restart at step #2 > > - process event - exec '/tmp/some_service.sh restart' > > > > - Poll period #4 > > - collect exit code from /tmp/script.sh > > - exit status == 0, so all ok now > > > > If I try to use different condition, for example 'status == 1 for 2 > cycles' - this event chain will be just longer, i.e. after two failures it > will restart application, but because next poll cycle is also "failure" - > three failed cycles, monit will still successfully match against 'status == > 1 for 2 cycles'. > > > > Is there any way to workaround double restart (time for restart is up to > 15-20 seconds) using monit configuration, either ignoring exit status on > some step, or writing some special condition ? > > > > wbr, > > Dmitry. > > > > -- > To unsubscribe: > https://lists.nongnu.org/mailman/listinfo/monit-general >
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
