Thanks for the swift response, and that's great to hear. Based on your
experience, do you think this is likely to take weeks or months before
being available?

On 22/06/2015 20:13, Martin Pala wrote:
> Hi,
>
> the refactoring of the test scheduler mentioned in the manual with fix
> for program execution already begun.
>
> Regards,
> Martin
>
>
>> On 22 Jun 2015, at 20:05, Struan Bartlett
>> <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hi
>>
>> I'd like to query the rationale for a behaviour I've experiencing in
>> monit. I'm testing with the following config:
>>
>> # Test config start
>> set daemon 10
>>
>> check program MyProgram with path "/bin/dash -c 'echo OK!; exit 1'"
>>    every "06 * * * *"
>>    if status != 0 then alert
>> # Test config end
>>
>> As expected, monit runs the dash test program at 6 minutes past the
>> hour. The dash script finishes immediately. However, Monit doesn't
>> pick up, report or alert on the exit code in a timely manner. Until
>> the next time Monit is scheduled to run the test script, the dash
>> script remains as a zombie. But that is an hour later, which is a
>> long time to wait to be alerted to the script failing.
>>
>> If the 'every' schedule was "06 0 * * *" then it would seem one
>> should expect to wait 24 hours before being alerted to the script
>> failing!
>>
>> I realise the Monit manual explains:
>>
>> "The asynchronous nature of the program check [...] comes with a
>> side-effect: when the program has finished executing and is waiting
>> for Monit to collect the result, it becomes a so-called "zombie"
>> process [...] the zombie process is removed from the system as soon
>> as Monit collects the exit status. This means that every "check
>> program" will be associated with either a running process or a
>> temporary zombie. This unwanted zombie side-effect will be removed in
>> a later release of Monit."
>>
>> That may be so, however why doesn't Monit reap the child and collect
>> the exit code at the *next poll cycle after the child exits* (i.e.
>> within 10 seconds of the test script finishing given the 'set daemon
>> 10' line in the test config above) rather than when the program is
>> next scheduled to be run? Maybe I'm missing something, but the
>> current behaviour seems to undermine the entire purpose of providing
>> alerts on program failure (when used in conjunction with cron-style
>> scheduling). That is the behaviour I'd like to query the rationale for.
>>
>> Thanks in advance.
>>
>> Kind regards
>>
>> Struan

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to