[
https://issues.apache.org/jira/browse/SLING-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385447#comment-15385447
]
Georg Henzler commented on SLING-5867:
--------------------------------------
Correct, stopping/interrupting a HC thread can potentially be dangerous (e.g.
corrupting the repository), that's why this is never done. But the HC executor
ensures that per HC there is at most one thread running even if that one threat
hangs indefinitely (if for one particular thread there is already an existing
future, this is always reused).
But to come back to the initial issue: It could be useful to make maximum
execution time thresholds configurable for the request status health check
(e.g. {{maxExecutionTimeThresholdWarnInMs}} and
{{maxExecutionTimeThresholdCriticalInMs}}). The setting in theory could be per
line of config {{paths}}, but to benefit from parallel execution and since
SlingRequestStatusHealthCheck has {{configurationFactory=true}} it would be
more useful to have this as root level configuration values of
SlingRequestStatusHealthCheck. If {{maxExecutionTimeThresholdCriticalInMs}}
is longer than the timeout as set in the executor for a particular request
status config, it can always be set to async to avoid timeouts.
[~kwin] What about renaming this issue to "SlingRequestStatusHealthCheck should
support WARN/CRITICAL thresholds for maximum execution time"
> SlingRequestStatusHealthCheck should add timeout support
> --------------------------------------------------------
>
> Key: SLING-5867
> URL: https://issues.apache.org/jira/browse/SLING-5867
> Project: Sling
> Issue Type: Bug
> Components: Health Check
> Affects Versions: Health Check Support 1.0.4
> Reporter: Konrad Windszus
> Assignee: Konrad Windszus
>
> Currently {{o.a.s.hc.support.impl.SlingRequestStatusHealthCheck}} just
> synchronously calls {{SlingRequestProcessor.processResponse}}.
> That means in case of a non-returning response (e.g. caused by a deadlock
> like SLING-5847) the health check will just timeout but never actually really
> fail (even after a very long time).
> In this case it is good to create a dedicated timeout handling within the
> {{SlingRequestStatusHealthCheck}} (separate from the timeout in
> {{HealthCheckExecutorImpl}}) because for each individual request health check
> configuration you might want to set different timeouts.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)