While starting to use the new health check stuff we came across different
things which I would like to discuss.

According to the API health checks are considered to execute quickly -
which is fine. However there is no prevention against it. I'm not sure if
we should do this, but e.g. the EventAdmin blacklists long running health
checks after their first invocation.
This gets even more tricky as health checks are registered as mbeans with
only attributes and no methods. The assumption here is, that whenever the
mbean is triggered (an attribute value is fetched), the health check is
executed. This is fine as long as the health check execution is fast and
the client acknowledges this. If the client fetches all available
attributes in one call, the hc is executed only once. If the client fetches
the attributes one after the other, the hc is executed on each attribute
fetch. Now combine this with a long running health check.

This brings me to the topic of concurrent invocations. Assuming a health
check execution is fast, this shouldn't be a problem - if it's not,
concurrent invocation might lead to problems. Imagine N users checking the
health of the system at the same time - or monitoring agents fetching
regularly the status. Maybe the execution should rather be synchronized?

And finally for long running health checks whether they are done sync or
async users would like to see a progress bar once the hc runs.

All of this can be solved easily, if we stick to "health check execution
should be fast and not expensive". In that case we might add black listing.
Things like a progress bar etc. have to be done through whatever mechanism
is used to execute the hc asynchronously.

WDYT?

Carsten
-- 
Carsten Ziegeler
[email protected]

Reply via email to