Re: Check program problem

Dmitry Zamaruev Mon, 19 Nov 2012 08:12:16 -0800

Nope, this will just increase poll interval for particular service, so
service will be restarted twice, but with increased time between restarts :)


Assuming we have some service running with PID=10 (in /tmp/file.pid), and
script that checks if process mentioned in /tmp/file.pid have less then 100
threads, if not - return 1.

Poll (cycle) #1:
- /tmp/script.sh is run against /tmp/file.pid (contains 10) and returns 1,
but this value is not collected by monit until next cycle

Poll #2:
- monit collects status#1, fires event that 'status != 0'
- BEFORE processing event /tmp/script.sh is run again (/tmp/file.pid still
contains 10) and return value is 1 again, and again it is postponed till
next poll period
- monit process exec action (because status#1 == 1) and restart service
(now /tmp/file.pid will contain 20 for example)

Poll #3:
- monit collects status#2, fires event that 'status != 0' - but service was
already restarted at #2 and this is obsolete value!
- before processing event /tmp/script.sh is run against /tmp/file.pid
(contains 20) and returns 0 (because it is fresh process)
- monit process exec action (because status#2 == 1) and restart service
(now /tmp/file.pid will contain 30 for example)

Poll #4:
- monit collects status#3 and see that it is ok


So the problem is that 'check program' result is one step behind than other
actions, and at some point in time (poll #3) it uses obsolete information
to perform actions.



On Mon, Nov 19, 2012 at 5:43 PM, Jan-Henrik Haukeland
<[email protected]>wrote:

> I'm not sure I understand the problem, but that does not prevent me from
> having a suggestion :) I'm wondering if the every statement could help in
> this situation? As in:
>
> check program with path '/tmp/script.sh'
>   every 2 cycles
>   if status != 0 then exec '/tmp/some_service.sh restart'
>
> Any luck with that?
>
>
> On Nov 19, 2012, at 12:12 PM, Dmitry Zamaruev <[email protected]>
> wrote:
>
> > Hi,
> >
> > I'm using 'check program' to monitor thread leak in one of our
> applications. All is working nice, except that application is always
> restarted twice. I dig through source code and found that it should be
> related to how 'check program' is handled.
> > Here is my configuration example:
> >
> > check program with path '/tmp/script.sh'
> >   if status != 0 then exec '/tmp/some_service.sh restart'
> >
> > Here is the workflow I'm seeing:
> >
> > - Poll period #1:
> >   - start /tmp/script.sh
> >
> > - Poll period #2:
> >   - collect exit code from /tmp/script.sh
> >   - raise event with status = 1
> >   - start /tmp/script.sh  <<== problem here, script is run against
> service before restart! so it will return status=1
> >   - process event - exec '/tmp/some_service.sh restart'
> >
> > - Poll period #3
> >   - collect exit code from /tmp/script.sh
> >   - raise event with status = 1
> >   - start /tmp/script.sh  <<== here script is run against fresh service
> after restart at step #2
> >   - process event - exec '/tmp/some_service.sh restart'
> >
> > - Poll period #4
> >   - collect exit code from /tmp/script.sh
> >   - exit status == 0, so all ok now
> >
> > If I try to use different condition, for example 'status == 1 for 2
> cycles' - this event chain will be just longer, i.e. after two failures it
> will restart application, but because next poll cycle is also "failure" -
> three failed cycles, monit will still successfully match against 'status ==
> 1 for 2 cycles'.
> >
> > Is there any way to workaround double restart (time for restart is up to
> 15-20 seconds) using monit configuration, either ignoring exit status on
> some step,  or writing some special condition ?
> >
> > wbr,
> > Dmitry.
>
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
>

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Re: Check program problem

Reply via email to