Check program problem

Dmitry Zamaruev Mon, 19 Nov 2012 03:12:27 -0800

Hi,

I'm using 'check program' to monitor thread leak in one of our
applications. All is working nice, except that application is always
restarted twice. I dig through source code and found that it should be
related to how 'check program' is handled.
Here is my configuration example:


check program with path '/tmp/script.sh'
  if status != 0 then exec '/tmp/some_service.sh restart'

Here is the workflow I'm seeing:

- Poll period #1:
  - start /tmp/script.sh

- Poll period #2:
  - collect exit code from /tmp/script.sh
  - raise event with status = 1
  - start /tmp/script.sh  <<== problem here, script is run against service
before restart! so it will return status=1
  - process event - exec '/tmp/some_service.sh restart'

- Poll period #3
  - collect exit code from /tmp/script.sh
  - raise event with status = 1
  - start /tmp/script.sh  <<== here script is run against fresh service
after restart at step #2
  - process event - exec '/tmp/some_service.sh restart'

- Poll period #4
  - collect exit code from /tmp/script.sh
  - exit status == 0, so all ok now

If I try to use different condition, for example 'status == 1 for 2 cycles'
- this event chain will be just longer, i.e. after two failures it will
restart application, but because next poll cycle is also "failure" - three
failed cycles, monit will still successfully match against 'status == 1 for
2 cycles'.

Is there any way to workaround double restart (time for restart is up to
15-20 seconds) using monit configuration, either ignoring exit status on
some step,  or writing some special condition ?

wbr,
Dmitry.

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Check program problem

Reply via email to