[ 
https://issues.apache.org/jira/browse/SLING-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384034#comment-15384034
 ] 

Georg Henzler commented on SLING-5867:
--------------------------------------

bq. ... in case of a non-returning response ... the health check will just 
timeout but never actually really fail (even after a very long time).

This should not be true: 
https://github.com/apache/sling/blob/eecc7e401a0894984a5eaa8992dedfcb5a18e0e5/bundles/extensions/healthcheck/core/src/main/java/org/apache/sling/hc/core/impl/executor/HealthCheckExecutorImpl.java#L432
 should make it fail eventually after 5 minutes (configurable via 
https://github.com/apache/sling/blob/eecc7e401a0894984a5eaa8992dedfcb5a18e0e5/bundles/extensions/healthcheck/core/src/main/java/org/apache/sling/hc/core/impl/executor/HealthCheckExecutorImpl.java#L88)

bq. ... create a dedicated timeout handling within the 
SlingRequestStatusHealthCheck (separate from the timeout in 
HealthCheckExecutorImpl) because for each individual request health check 
configuration you might want to set different timeouts.

I think the maximum time you can wait for a response depends a lot more on from 
where you are calling (e.g. load balancer or human for a dashboard) than to a 
fixed set of tags or a particular check (hence configuring this per check or 
tag does not make much sense IMHO). So at the moment, timeout handling is done 
by using
* a global default
* a per call setting when using the HC Executor (e.g. the request param 
"timeout" of the HC servlet that set the HC executor option at 
https://github.com/apache/sling/blob/eecc7e401a0894984a5eaa8992dedfcb5a18e0e5/bundles/extensions/healthcheck/core/src/main/java/org/apache/sling/hc/api/execution/HealthCheckExecutionOptions.java#L26)



> SlingRequestStatusHealthCheck should add timeout support
> --------------------------------------------------------
>
>                 Key: SLING-5867
>                 URL: https://issues.apache.org/jira/browse/SLING-5867
>             Project: Sling
>          Issue Type: Bug
>          Components: Health Check
>    Affects Versions: Health Check Support 1.0.4
>            Reporter: Konrad Windszus
>            Assignee: Konrad Windszus
>
> Currently {{o.a.s.hc.support.impl.SlingRequestStatusHealthCheck}} just 
> synchronously calls {{SlingRequestProcessor.processResponse}}.
> That means in case of a non-returning response (e.g. caused by a deadlock 
> like SLING-5847) the health check will just timeout but never actually really 
> fail (even after a very long time).
> In this case it is good to create a dedicated timeout handling within the 
> {{SlingRequestStatusHealthCheck}} (separate from the timeout in 
> {{HealthCheckExecutorImpl}}) because for each individual request health check 
> configuration you might want to set different timeouts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to