Hi, Looking at SLING-3207 I think this deserves a bit more discussion: I don't think this is only about JMX, and providing an executor service that takes care of caching and async execution can help make the individual health checks simpler.
>From a client's point of view I would suggest the following behavior: 1) Executing a health check (via a new HealthCheckExecutor service that we'll add) is guaranteed to take a most T msec (configurable) 2) If an individual health check's execute() method takes longer that T, the executor returns the last result that was previously computed, or an empty result with state=NODATA if we don't have that yet. The Result contains the timestamp of when it was computed. 3) The executor service prevents concurrent execution of a given health check This is very similar to how an HTTP cache works, except that in 2) we return an old result instead of waiting. With this we can drop the "execute() method must be fast" requirement (within reasonable bounds) which can simplify the actual health check implementations. WDYT? -Bertrand
