[ 
https://issues.apache.org/jira/browse/SLING-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385447#comment-15385447
 ] 

Georg Henzler commented on SLING-5867:
--------------------------------------

Correct, stopping/interrupting a HC thread can potentially be dangerous (e.g. 
corrupting the repository), that's why this is never done. But the HC executor 
ensures that per HC there is at most one thread running even if that one threat 
hangs indefinitely (if for one particular thread there is already an existing 
future, this is always reused). 

But to come back to the initial issue: It could be useful to make maximum 
execution time thresholds configurable for the request status health check 
(e.g. {{maxExecutionTimeThresholdWarnInMs}} and 
{{maxExecutionTimeThresholdCriticalInMs}}). The setting in theory could be per 
line of config {{paths}}, but to benefit from parallel execution and since 
SlingRequestStatusHealthCheck has {{configurationFactory=true}} it would be 
more useful to have this as root level configuration values of 
SlingRequestStatusHealthCheck. If   {{maxExecutionTimeThresholdCriticalInMs}} 
is longer than the timeout as set in the executor for a particular request 
status config, it can always be set to async to avoid timeouts. 

[~kwin] What about renaming this issue to "SlingRequestStatusHealthCheck should 
support WARN/CRITICAL thresholds for maximum execution time"

> SlingRequestStatusHealthCheck should add timeout support
> --------------------------------------------------------
>
>                 Key: SLING-5867
>                 URL: https://issues.apache.org/jira/browse/SLING-5867
>             Project: Sling
>          Issue Type: Bug
>          Components: Health Check
>    Affects Versions: Health Check Support 1.0.4
>            Reporter: Konrad Windszus
>            Assignee: Konrad Windszus
>
> Currently {{o.a.s.hc.support.impl.SlingRequestStatusHealthCheck}} just 
> synchronously calls {{SlingRequestProcessor.processResponse}}.
> That means in case of a non-returning response (e.g. caused by a deadlock 
> like SLING-5847) the health check will just timeout but never actually really 
> fail (even after a very long time).
> In this case it is good to create a dedicated timeout handling within the 
> {{SlingRequestStatusHealthCheck}} (separate from the timeout in 
> {{HealthCheckExecutorImpl}}) because for each individual request health check 
> configuration you might want to set different timeouts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to