[ https://issues.apache.org/jira/browse/FELIX-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778991#comment-17778991 ]
Joerg Hoh commented on FELIX-6663: ---------------------------------- PR: https://github.com/apache/felix-dev/pull/239 > Warn if healthcheck execution takes too long > -------------------------------------------- > > Key: FELIX-6663 > URL: https://issues.apache.org/jira/browse/FELIX-6663 > Project: Felix > Issue Type: Task > Components: Health Checks > Affects Versions: healthcheck.core 2.2.0 > Reporter: Joerg Hoh > Priority: Major > > We monitor our system using Felix Healthchecks and require that some > healthchecks are reported OK at least every 5 seconds. For this we configured > the timeout in theĀ HealthCheckOptions to 5 seconds. > But we face rarely the situation that the system goes unhealthy without a > healthcheck being executed. It even seems that none of the required > healthcheck is executed during that time at all. > I already ruled out a few obvious cases (full GC, maxed out CPU), but I still > have a few cases which I cannot explain yet. Also while checking the code, I > found that on every invocation of the HealthcheckExecutor.execute() all > metadata for the healthchecks are collected, which require access to the OSGI > Service registry. My application also has situation where a lot of access to > the Service registry happens, which can suffer from lock contention under > load, and that is not included into the timeout calculation of the of the > healthchecks. > As a first step I would like to add some more logging in case the overall > execution of the healthchecks exceed the configured timeout. -- This message was sent by Atlassian Jira (v8.20.10#820010)