There are many more changes in progress (process engine refactoring, etc.) … may take ca. 1-2 months.
> On 22 Jun 2015, at 22:42, Struan Bartlett <[email protected]> > wrote: > > Thanks for the swift response, and that's great to hear. Based on your > experience, do you think this is likely to take weeks or months before being > available? > > On 22/06/2015 20:13, Martin Pala wrote: >> Hi, >> >> the refactoring of the test scheduler mentioned in the manual with fix for >> program execution already begun. >> >> Regards, >> Martin >> >> >>> On 22 Jun 2015, at 20:05, Struan Bartlett <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi >>> >>> I'd like to query the rationale for a behaviour I've experiencing in monit. >>> I'm testing with the following config: >>> >>> # Test config start >>> set daemon 10 >>> >>> check program MyProgram with path "/bin/dash -c 'echo OK!; exit 1'" >>> every "06 * * * *" >>> if status != 0 then alert >>> # Test config end >>> >>> As expected, monit runs the dash test program at 6 minutes past the hour. >>> The dash script finishes immediately. However, Monit doesn't pick up, >>> report or alert on the exit code in a timely manner. Until the next time >>> Monit is scheduled to run the test script, the dash script remains as a >>> zombie. But that is an hour later, which is a long time to wait to be >>> alerted to the script failing. >>> >>> If the 'every' schedule was "06 0 * * *" then it would seem one should >>> expect to wait 24 hours before being alerted to the script failing! >>> >>> I realise the Monit manual explains: >>> >>> "The asynchronous nature of the program check [...] comes with a >>> side-effect: when the program has finished executing and is waiting for >>> Monit to collect the result, it becomes a so-called "zombie" process [...] >>> the zombie process is removed from the system as soon as Monit collects the >>> exit status. This means that every "check program" will be associated with >>> either a running process or a temporary zombie. This unwanted zombie >>> side-effect will be removed in a later release of Monit." >>> >>> That may be so, however why doesn't Monit reap the child and collect the >>> exit code at the *next poll cycle after the child exits* (i.e. within 10 >>> seconds of the test script finishing given the 'set daemon 10' line in the >>> test config above) rather than when the program is next scheduled to be >>> run? Maybe I'm missing something, but the current behaviour seems to >>> undermine the entire purpose of providing alerts on program failure (when >>> used in conjunction with cron-style scheduling). That is the behaviour I'd >>> like to query the rationale for. >>> >>> Thanks in advance. >>> >>> Kind regards >>> >>> Struan > > -- > To unsubscribe: > https://lists.nongnu.org/mailman/listinfo/monit-general
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
