Thanks for the swift response, and that's great to hear. Based on your experience, do you think this is likely to take weeks or months before being available?
On 22/06/2015 20:13, Martin Pala wrote: > Hi, > > the refactoring of the test scheduler mentioned in the manual with fix > for program execution already begun. > > Regards, > Martin > > >> On 22 Jun 2015, at 20:05, Struan Bartlett >> <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi >> >> I'd like to query the rationale for a behaviour I've experiencing in >> monit. I'm testing with the following config: >> >> # Test config start >> set daemon 10 >> >> check program MyProgram with path "/bin/dash -c 'echo OK!; exit 1'" >> every "06 * * * *" >> if status != 0 then alert >> # Test config end >> >> As expected, monit runs the dash test program at 6 minutes past the >> hour. The dash script finishes immediately. However, Monit doesn't >> pick up, report or alert on the exit code in a timely manner. Until >> the next time Monit is scheduled to run the test script, the dash >> script remains as a zombie. But that is an hour later, which is a >> long time to wait to be alerted to the script failing. >> >> If the 'every' schedule was "06 0 * * *" then it would seem one >> should expect to wait 24 hours before being alerted to the script >> failing! >> >> I realise the Monit manual explains: >> >> "The asynchronous nature of the program check [...] comes with a >> side-effect: when the program has finished executing and is waiting >> for Monit to collect the result, it becomes a so-called "zombie" >> process [...] the zombie process is removed from the system as soon >> as Monit collects the exit status. This means that every "check >> program" will be associated with either a running process or a >> temporary zombie. This unwanted zombie side-effect will be removed in >> a later release of Monit." >> >> That may be so, however why doesn't Monit reap the child and collect >> the exit code at the *next poll cycle after the child exits* (i.e. >> within 10 seconds of the test script finishing given the 'set daemon >> 10' line in the test config above) rather than when the program is >> next scheduled to be run? Maybe I'm missing something, but the >> current behaviour seems to undermine the entire purpose of providing >> alerts on program failure (when used in conjunction with cron-style >> scheduling). That is the behaviour I'd like to query the rationale for. >> >> Thanks in advance. >> >> Kind regards >> >> Struan
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
