[
https://issues.apache.org/jira/browse/SLING-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384034#comment-15384034
]
Georg Henzler commented on SLING-5867:
--------------------------------------
bq. ... in case of a non-returning response ... the health check will just
timeout but never actually really fail (even after a very long time).
This should not be true:
https://github.com/apache/sling/blob/eecc7e401a0894984a5eaa8992dedfcb5a18e0e5/bundles/extensions/healthcheck/core/src/main/java/org/apache/sling/hc/core/impl/executor/HealthCheckExecutorImpl.java#L432
should make it fail eventually after 5 minutes (configurable via
https://github.com/apache/sling/blob/eecc7e401a0894984a5eaa8992dedfcb5a18e0e5/bundles/extensions/healthcheck/core/src/main/java/org/apache/sling/hc/core/impl/executor/HealthCheckExecutorImpl.java#L88)
bq. ... create a dedicated timeout handling within the
SlingRequestStatusHealthCheck (separate from the timeout in
HealthCheckExecutorImpl) because for each individual request health check
configuration you might want to set different timeouts.
I think the maximum time you can wait for a response depends a lot more on from
where you are calling (e.g. load balancer or human for a dashboard) than to a
fixed set of tags or a particular check (hence configuring this per check or
tag does not make much sense IMHO). So at the moment, timeout handling is done
by using
* a global default
* a per call setting when using the HC Executor (e.g. the request param
"timeout" of the HC servlet that set the HC executor option at
https://github.com/apache/sling/blob/eecc7e401a0894984a5eaa8992dedfcb5a18e0e5/bundles/extensions/healthcheck/core/src/main/java/org/apache/sling/hc/api/execution/HealthCheckExecutionOptions.java#L26)
> SlingRequestStatusHealthCheck should add timeout support
> --------------------------------------------------------
>
> Key: SLING-5867
> URL: https://issues.apache.org/jira/browse/SLING-5867
> Project: Sling
> Issue Type: Bug
> Components: Health Check
> Affects Versions: Health Check Support 1.0.4
> Reporter: Konrad Windszus
> Assignee: Konrad Windszus
>
> Currently {{o.a.s.hc.support.impl.SlingRequestStatusHealthCheck}} just
> synchronously calls {{SlingRequestProcessor.processResponse}}.
> That means in case of a non-returning response (e.g. caused by a deadlock
> like SLING-5847) the health check will just timeout but never actually really
> fail (even after a very long time).
> In this case it is good to create a dedicated timeout handling within the
> {{SlingRequestStatusHealthCheck}} (separate from the timeout in
> {{HealthCheckExecutorImpl}}) because for each individual request health check
> configuration you might want to set different timeouts.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)