[ 
https://issues.apache.org/jira/browse/FELIX-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778991#comment-17778991
 ] 

Joerg Hoh commented on FELIX-6663:
----------------------------------

PR: https://github.com/apache/felix-dev/pull/239

> Warn if healthcheck execution takes too long
> --------------------------------------------
>
>                 Key: FELIX-6663
>                 URL: https://issues.apache.org/jira/browse/FELIX-6663
>             Project: Felix
>          Issue Type: Task
>          Components: Health Checks
>    Affects Versions: healthcheck.core 2.2.0
>            Reporter: Joerg Hoh
>            Priority: Major
>
> We monitor our system using Felix Healthchecks and require that some 
> healthchecks are reported OK at least every 5 seconds. For this we configured 
> the timeout in theĀ  HealthCheckOptions to 5 seconds.
> But we face rarely the situation that the system goes unhealthy without a 
> healthcheck being executed. It even seems that none of the required 
> healthcheck is executed during that time at all.
> I already ruled out a few obvious cases (full GC, maxed out CPU), but I still 
> have a few cases which I cannot explain yet. Also while checking the code, I 
> found that on every invocation of the HealthcheckExecutor.execute() all 
> metadata for the healthchecks are collected, which require access to the OSGI 
> Service registry. My application also has situation where a lot of access to 
> the Service registry happens, which can suffer from lock contention under 
> load, and that is not included into the timeout calculation of the of the 
> healthchecks.
> As a first step I would like to add some more logging in case the overall 
> execution of the healthchecks exceed the configured timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to