Hi, I'm using 'check program' to monitor thread leak in one of our applications. All is working nice, except that application is always restarted twice. I dig through source code and found that it should be related to how 'check program' is handled. Here is my configuration example:
check program with path '/tmp/script.sh' if status != 0 then exec '/tmp/some_service.sh restart' Here is the workflow I'm seeing: - Poll period #1: - start /tmp/script.sh - Poll period #2: - collect exit code from /tmp/script.sh - raise event with status = 1 - start /tmp/script.sh <<== problem here, script is run against service before restart! so it will return status=1 - process event - exec '/tmp/some_service.sh restart' - Poll period #3 - collect exit code from /tmp/script.sh - raise event with status = 1 - start /tmp/script.sh <<== here script is run against fresh service after restart at step #2 - process event - exec '/tmp/some_service.sh restart' - Poll period #4 - collect exit code from /tmp/script.sh - exit status == 0, so all ok now If I try to use different condition, for example 'status == 1 for 2 cycles' - this event chain will be just longer, i.e. after two failures it will restart application, but because next poll cycle is also "failure" - three failed cycles, monit will still successfully match against 'status == 1 for 2 cycles'. Is there any way to workaround double restart (time for restart is up to 15-20 seconds) using monit configuration, either ignoring exit status on some step, or writing some special condition ? wbr, Dmitry.
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
