Re: [Bug 62318] healthcheck

Jim Jagielski Fri, 24 Aug 2018 10:05:56 -0700


> On Aug 24, 2018, at 12:05 PM, Eric Covener <[email protected]> wrote:
> 
> On Fri, Aug 24, 2018 at 11:57 AM Christophe JAILLET
> <[email protected] <mailto:[email protected]>> wrote:
>> 
>> Le 24/08/2018 à 16:40, Jim Jagielski a écrit :
>>> I was wondering if someone wanted to provide a sanity check
>>> on the above PR and what's "expected" by the health check code.
>>> 
>>> It would be very easy to adjust so that hcinterval was not
>>> the time between successive checks but the interval between
>>> the end of one and the start of another, but I'm not sure that
>>> is as useful. In other words, I think the current behavior
>>> is right (but think the docs need to be updated), but am
>>> willing to have my mind changed :)
>>> 
>> Hi Jim,
>> 
>> the current behavior is also what I would expect.
>> If I configure a check every 10s, I would expect 6 checks each minute,
>> even if the test itself takes time to perform.
> 
> 
> Bug describes something else IIUC.  Because the watchdog calls us 10
> times per second, it continuously sees that the worker hasn't been
> health checked within the desired interval and queues up a check, it
> doesn't know one is queued.


But that is only an issue, afaict, if the time taken to do the health check is
greater than the interval chosen... Or am I misunderstanding? That is,
if the interval is 200ms, and the health check takes 100ms, all is fine, we
get 5 checks a second. 

I guess what we could do is emit a warning if when a check is queued, we
already have one queued, or in process. This would some info to the sysadmin.
We could also track the time taken to perform a check and have that available
via mod_status as well. But these all assume that the underlying logic, and
how it's implemented, is sane.

Re: [Bug 62318] healthcheck

Reply via email to