[
https://issues.apache.org/jira/browse/SLING-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845234#comment-13845234
]
Bertrand Delacretaz commented on SLING-3278:
--------------------------------------------
I think the tentative spec at [1] covers async execution and caching, slightly
reworked here to take Georg's suggestions into account:
# Executing a health check via the HealthCheckExecutor is guaranteed to return
a Result in a most T msec. T is configurable and can be overridden in the
execution call.
# If the actual health check execution takes longer than T, the executor
returns the last result that was previously computed and cached, or an empty
result with state=NODATA (new state) if we don't have that yet.
# The executor service prevents concurrent execution of a given health check
# Execution of a health check times out after U msec, configurable, and returns
a Result that indicates the timeout (with a new state? TBD)
With this you don't need an async property on a health check service, the
decision to return its last cached result is based on its actual execution
time. The behavior switch based on execution time > T can be implemented using
Future.get(timeout) and catching the TimeoutException to return the cached
result.
I also suggest adding timing info to the Result: creation time, optional time
to live and execution duration, to manage cache expiration and provide
freshness info and execution statistics.
[1]
http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html
> Provide a HealthCheckExecutorService
> ------------------------------------
>
> Key: SLING-3278
> URL: https://issues.apache.org/jira/browse/SLING-3278
> Project: Sling
> Issue Type: New Feature
> Components: Health Check
> Reporter: Georg Henzler
>
> Goals:
> * Be able to get an overall (aggregated) result as quickly as possible
> (ideally <2sec)
> * Whenever possible, return most current results (e.g. for a memory check)
> * Provide a declarative way for async checks (async checks should be the
> exception though)
> Approach
> * Run checks in parallel
> * Make sure long running (or even stuck) checks are timed out
> * If a health check must run asynchronously (because its execution time
> cannot be optimized), it should be enough to just specify a service property
> (e.g. "hc.async").
> See also
> http://apache-sling.73963.n3.nabble.com/Health-Check-Improvements-td4029330.html#a4029402
> http://apache-sling.73963.n3.nabble.com/Health-checks-execution-service-td4028477.html
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)