Hi,

the refactoring of the test scheduler mentioned in the manual with fix for 
program execution already begun.

Regards,
Martin


> On 22 Jun 2015, at 20:05, Struan Bartlett <[email protected]> 
> wrote:
> 
> Hi
> 
> I'd like to query the rationale for a behaviour I've experiencing in monit. 
> I'm testing with the following config:
> 
> # Test config start
> set daemon 10
> 
> check program MyProgram with path "/bin/dash -c 'echo OK!; exit 1'"
>    every "06 * * * *"
>    if status != 0 then alert
> # Test config end
> 
> As expected, monit runs the dash test program at 6 minutes past the hour. The 
> dash script finishes immediately. However, Monit doesn't pick up, report or 
> alert on the exit code in a timely manner. Until the next time Monit is 
> scheduled to run the test script, the dash script remains as a zombie. But 
> that is an hour later, which is a long time to wait to be alerted to the 
> script failing.
> 
> If the 'every' schedule was "06 0 * * *" then it would seem one should expect 
> to wait 24 hours before being alerted to the script failing!
> 
> I realise the Monit manual explains:
> 
> "The asynchronous nature of the program check [...] comes with a side-effect: 
> when the program has finished executing and is waiting for Monit to collect 
> the result, it becomes a so-called "zombie" process [...] the zombie process 
> is removed from the system as soon as Monit collects the exit status. This 
> means that every "check program" will be associated with either a running 
> process or a temporary zombie. This unwanted zombie side-effect will be 
> removed in a later release of Monit."
> 
> That may be so, however why doesn't Monit reap the child and collect the exit 
> code at the *next poll cycle after the child exits* (i.e. within 10 seconds 
> of the test script finishing given the 'set daemon 10' line in the test 
> config above) rather than when the program is next scheduled to be run? Maybe 
> I'm missing something, but the current behaviour seems to undermine the 
> entire purpose of providing alerts on program failure (when used in 
> conjunction with cron-style scheduling). That is the behaviour I'd like to 
> query the rationale for.
> 
> Thanks in advance.
> 
> Kind regards
> 
> Struan
> 
> -- 
> Struan Bartlett
> NewsNow Publishing Limited
> 
> Tel:  +44 (0)845 838 8890
> Fax:  +44 (0)845 838 8898
> The UK's #1 News Portal:
> > www.NewsNow.co.uk <http://www.newsnow.co.uk/> (est. 1998)
> 
> Also tailored for Mobile:
> > mobile.NewsNow.co.uk <http://mobile.newsnow.co.uk/>
> Now with FREE Personalisation:
> > Register <http://www.newsnow.co.uk/register/>
> Bespoke B2B Internet News Monitoring:
> > Internet News Monitoring <http://www.newsnow.co.uk/services/newsmonitoring/>
> Bespoke B2B Headlines for Websites:
> > Editorial-In-A-Box <http://www.newsnow.co.uk/services/websites/>
> NewsNow Publishing Limited, trading also as NewsNow.co.uk, is a company 
> registered in England and Wales under company no. 3435857 with registered 
> office The Euston Office, 1 Euston Square, 40 Melton Street, London NW1 2FD
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to