While starting to use the new health check stuff we came across different things which I would like to discuss.
According to the API health checks are considered to execute quickly - which is fine. However there is no prevention against it. I'm not sure if we should do this, but e.g. the EventAdmin blacklists long running health checks after their first invocation. This gets even more tricky as health checks are registered as mbeans with only attributes and no methods. The assumption here is, that whenever the mbean is triggered (an attribute value is fetched), the health check is executed. This is fine as long as the health check execution is fast and the client acknowledges this. If the client fetches all available attributes in one call, the hc is executed only once. If the client fetches the attributes one after the other, the hc is executed on each attribute fetch. Now combine this with a long running health check. This brings me to the topic of concurrent invocations. Assuming a health check execution is fast, this shouldn't be a problem - if it's not, concurrent invocation might lead to problems. Imagine N users checking the health of the system at the same time - or monitoring agents fetching regularly the status. Maybe the execution should rather be synchronized? And finally for long running health checks whether they are done sync or async users would like to see a progress bar once the hc runs. All of this can be solved easily, if we stick to "health check execution should be fast and not expensive". In that case we might add black listing. Things like a progress bar etc. have to be done through whatever mechanism is used to execute the hc asynchronously. WDYT? Carsten -- Carsten Ziegeler [email protected]
