Re: Zombie processes and exit code retrieval

Struan Bartlett Mon, 22 Jun 2015 13:42:42 -0700

Thanks for the swift response, and that's great to hear. Based on your
experience, do you think this is likely to take weeks or months before
being available?


On 22/06/2015 20:13, Martin Pala wrote:
> Hi,
>
> the refactoring of the test scheduler mentioned in the manual with fix
> for program execution already begun.
>
> Regards,
> Martin
>
>
>> On 22 Jun 2015, at 20:05, Struan Bartlett
>> <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hi
>>
>> I'd like to query the rationale for a behaviour I've experiencing in
>> monit. I'm testing with the following config:
>>
>> # Test config start
>> set daemon 10
>>
>> check program MyProgram with path "/bin/dash -c 'echo OK!; exit 1'"
>>    every "06 * * * *"
>>    if status != 0 then alert
>> # Test config end
>>
>> As expected, monit runs the dash test program at 6 minutes past the
>> hour. The dash script finishes immediately. However, Monit doesn't
>> pick up, report or alert on the exit code in a timely manner. Until
>> the next time Monit is scheduled to run the test script, the dash
>> script remains as a zombie. But that is an hour later, which is a
>> long time to wait to be alerted to the script failing.
>>
>> If the 'every' schedule was "06 0 * * *" then it would seem one
>> should expect to wait 24 hours before being alerted to the script
>> failing!
>>
>> I realise the Monit manual explains:
>>
>> "The asynchronous nature of the program check [...] comes with a
>> side-effect: when the program has finished executing and is waiting
>> for Monit to collect the result, it becomes a so-called "zombie"
>> process [...] the zombie process is removed from the system as soon
>> as Monit collects the exit status. This means that every "check
>> program" will be associated with either a running process or a
>> temporary zombie. This unwanted zombie side-effect will be removed in
>> a later release of Monit."
>>
>> That may be so, however why doesn't Monit reap the child and collect
>> the exit code at the *next poll cycle after the child exits* (i.e.
>> within 10 seconds of the test script finishing given the 'set daemon
>> 10' line in the test config above) rather than when the program is
>> next scheduled to be run? Maybe I'm missing something, but the
>> current behaviour seems to undermine the entire purpose of providing
>> alerts on program failure (when used in conjunction with cron-style
>> scheduling). That is the behaviour I'd like to query the rationale for.
>>
>> Thanks in advance.
>>
>> Kind regards
>>
>> Struan

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Re: Zombie processes and exit code retrieval

Reply via email to