Hi Apart from JMX (which is a separate biest), I agree, that we have to think about fixing the long-running check issue.
How about a „LongRunningHealthCheck“ service ? Such services would be picked up by the HealthCheck infrastructure and execute as tasks (maybe the service properties could even provide scheduling properties). The HealthCheck infra would then register HealthCheck service object providing the most recent results from the LongRunningHealthCheck tasks. Regards Felix — Felix Meschberger | Principal Scientist | Adobe Am 28.10.2013 um 10:11 schrieb Bertrand Delacretaz <[email protected]>: > Hi, > > Looking at SLING-3207 I think this deserves a bit more discussion: I > don't think this is only about JMX, and providing an executor service > that takes care of caching and async execution can help make the > individual health checks simpler. > > From a client's point of view I would suggest the following behavior: > > 1) Executing a health check (via a new HealthCheckExecutor service > that we'll add) is guaranteed to take a most T msec (configurable) > > 2) If an individual health check's execute() method takes longer that > T, the executor returns the last result that was previously computed, > or an empty result with state=NODATA if we don't have that yet. The > Result contains the timestamp of when it was computed. > > 3) The executor service prevents concurrent execution of a given health check > > This is very similar to how an HTTP cache works, except that in 2) we > return an old result instead of waiting. > > With this we can drop the "execute() method must be fast" requirement > (within reasonable bounds) which can simplify the actual health check > implementations. > > WDYT? > -Bertrand
