Hi, On Tue, Oct 22, 2013 at 1:06 PM, Carsten Ziegeler <[email protected]> wrote: > ...According to the API health checks are considered to execute quickly - > which is fine. However there is no prevention against it. I'm not sure if > we should do this, but e.g. the EventAdmin blacklists long running health > checks after their first invocation...
I agree that preventing slow HealthCheck.execute() methods is a good idea. > ...This gets even more tricky as health checks are registered as mbeans with > only attributes and no methods... Right, enforcing fast execution looks like the right thing to do. > ...All of this can be solved easily, if we stick to "health check execution > should be fast and not expensive". In that case we might add black listing. > Things like a progress bar etc. have to be done through whatever mechanism > is used to execute the hc asynchronously.... Instead of permanent blacklisting I'd suggest returning a normal Result but with a TIMEOUT status. For example, a health check that causes lots of initializations (maybe because it's called right after Sling startup) might be quite slow on the first call, and then fast, so TIMEOUT on first call and actual results (maybe computed asynchronously) later makes sense, but permanent blacklisting would get in the way. I suggest implementing a timeout on the HealthCheck.execute() method (not sure how - HealthCheckExecutor service maybe) which returns a Result with a TIMEOUT state, that indicates how long the timeout was, and maybe a short term blacklisting of the HealthCheck, during which it returns a BLACKLISTED state result. WDYT? -Bertrand
