Re: Zombie processes and exit code retrieval

Struan Bartlett Mon, 28 Mar 2016 13:36:51 -0700

Hi (Martin?)

Please could you give me an update on progress resolving the issue
identified below? I've looked at the changelog for recent versions of
Monit but haven't been able to determine if the refactoring of the test
scheduler that you mentioned had began, has now been completed.


Regards

Struan

On 23/06/2015 10:16, Martin Pala wrote:
> There are many more changes in progress (process engine refactoring,
> etc.) … may take ca. 1-2 months.
>
>
>
>> On 22 Jun 2015, at 22:42, Struan Bartlett
>> <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Thanks for the swift response, and that's great to hear. Based on
>> your experience, do you think this is likely to take weeks or months
>> before being available?
>>
>> On 22/06/2015 20:13, Martin Pala wrote:
>>> Hi,
>>>
>>> the refactoring of the test scheduler mentioned in the manual with
>>> fix for program execution already begun.
>>>
>>> Regards,
>>> Martin
>>>
>>>
>>>> On 22 Jun 2015, at 20:05, Struan Bartlett
>>>> <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>> Hi
>>>>
>>>> I'd like to query the rationale for a behaviour I've experiencing
>>>> in monit. I'm testing with the following config:
>>>>
>>>> # Test config start
>>>> set daemon 10
>>>>
>>>> check program MyProgram with path "/bin/dash -c 'echo OK!; exit 1'"
>>>>    every "06 * * * *"
>>>>    if status != 0 then alert
>>>> # Test config end
>>>>
>>>> As expected, monit runs the dash test program at 6 minutes past the
>>>> hour. The dash script finishes immediately. However, Monit doesn't
>>>> pick up, report or alert on the exit code in a timely manner. Until
>>>> the next time Monit is scheduled to run the test script, the dash
>>>> script remains as a zombie. But that is an hour later, which is a
>>>> long time to wait to be alerted to the script failing.
>>>>
>>>> If the 'every' schedule was "06 0 * * *" then it would seem one
>>>> should expect to wait 24 hours before being alerted to the script
>>>> failing!
>>>>
>>>> I realise the Monit manual explains:
>>>>
>>>> "The asynchronous nature of the program check [...] comes with a
>>>> side-effect: when the program has finished executing and is waiting
>>>> for Monit to collect the result, it becomes a so-called "zombie"
>>>> process [...] the zombie process is removed from the system as soon
>>>> as Monit collects the exit status. This means that every "check
>>>> program" will be associated with either a running process or a
>>>> temporary zombie. This unwanted zombie side-effect will be removed
>>>> in a later release of Monit."
>>>>
>>>> That may be so, however why doesn't Monit reap the child and
>>>> collect the exit code at the *next poll cycle after the child
>>>> exits* (i.e. within 10 seconds of the test script finishing given
>>>> the 'set daemon 10' line in the test config above) rather than when
>>>> the program is next scheduled to be run? Maybe I'm missing
>>>> something, but the current behaviour seems to undermine the entire
>>>> purpose of providing alerts on program failure (when used in
>>>> conjunction with cron-style scheduling). That is the behaviour I'd
>>>> like to query the rationale for.
>>>>
>>>> Thanks in advance.
>>>>
>>>> Kind regards
>>>>
>>>> Struan
>>
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
>
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general

-- 

Struan Bartlett
NewsNow.co.uk

The UK's #1 News Portal:
> www.NewsNow.co.uk <http://www.NewsNow.co.uk> (est. 1998)

Tel:    +44 (0)845 838 8890
Fax:    +44 (0)845 838 8898

NewsNow Publishing Limited, trading also as NewsNow.co.uk, is a company
registered in England and Wales under company no. 3435857 with
registered office The Euston Office, 1 Euston Square, 40 Melton Street,
London NW1 2FD

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Re: Zombie processes and exit code retrieval

Reply via email to