Re: Zombie processes and exit code retrieval

Martin Pala Tue, 23 Jun 2015 02:17:14 -0700

There are many more changes in progress (process engine refactoring, etc.) … 
may take ca. 1-2 months.




> On 22 Jun 2015, at 22:42, Struan Bartlett <[email protected]> 
> wrote:
> 
> Thanks for the swift response, and that's great to hear. Based on your 
> experience, do you think this is likely to take weeks or months before being 
> available?
> 
> On 22/06/2015 20:13, Martin Pala wrote:
>> Hi,
>> 
>> the refactoring of the test scheduler mentioned in the manual with fix for 
>> program execution already begun.
>> 
>> Regards,
>> Martin
>> 
>> 
>>> On 22 Jun 2015, at 20:05, Struan Bartlett <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi
>>> 
>>> I'd like to query the rationale for a behaviour I've experiencing in monit. 
>>> I'm testing with the following config:
>>> 
>>> # Test config start
>>> set daemon 10
>>> 
>>> check program MyProgram with path "/bin/dash -c 'echo OK!; exit 1'"
>>>    every "06 * * * *"
>>>    if status != 0 then alert
>>> # Test config end
>>> 
>>> As expected, monit runs the dash test program at 6 minutes past the hour. 
>>> The dash script finishes immediately. However, Monit doesn't pick up, 
>>> report or alert on the exit code in a timely manner. Until the next time 
>>> Monit is scheduled to run the test script, the dash script remains as a 
>>> zombie. But that is an hour later, which is a long time to wait to be 
>>> alerted to the script failing.
>>> 
>>> If the 'every' schedule was "06 0 * * *" then it would seem one should 
>>> expect to wait 24 hours before being alerted to the script failing!
>>> 
>>> I realise the Monit manual explains:
>>> 
>>> "The asynchronous nature of the program check [...] comes with a 
>>> side-effect: when the program has finished executing and is waiting for 
>>> Monit to collect the result, it becomes a so-called "zombie" process [...] 
>>> the zombie process is removed from the system as soon as Monit collects the 
>>> exit status. This means that every "check program" will be associated with 
>>> either a running process or a temporary zombie. This unwanted zombie 
>>> side-effect will be removed in a later release of Monit."
>>> 
>>> That may be so, however why doesn't Monit reap the child and collect the 
>>> exit code at the *next poll cycle after the child exits* (i.e. within 10 
>>> seconds of the test script finishing given the 'set daemon 10' line in the 
>>> test config above) rather than when the program is next scheduled to be 
>>> run? Maybe I'm missing something, but the current behaviour seems to 
>>> undermine the entire purpose of providing alerts on program failure (when 
>>> used in conjunction with cron-style scheduling). That is the behaviour I'd 
>>> like to query the rationale for.
>>> 
>>> Thanks in advance.
>>> 
>>> Kind regards
>>> 
>>> Struan
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Re: Zombie processes and exit code retrieval

Reply via email to