Re: ResultRegistry for Health Checks -> StickyResults instead?

Georg Henzler Tue, 06 Jun 2017 16:03:06 -0700

Hi,

The goal is to declare health check results that remain valid for a
specified time or forever.

So I agree metrics as proposed in comment [1] cannot achieve this(limited to 1, 5 and 15 minutes time windows). However I still think apurely declarative approach is cleaner and will lead to more consistencyacross HCs: We could introduce a HC property "hc.keepWarnStickyForMin"(and "hc.keepCriticalStickyForMin") - this can be entirely implementedin the impl package and would not require a new API. For the "Eventqueue overflown" example the propertyhc.keepWarnStickyForMin=Integer.MAX_VALUE could be set, the HC executorcould then append a result as follows:


INFO Checking Event Queue...
INFO Event Queue is currently fine.
WARN --- Sticky result from 2017-06-07 11:49 ---
INFO Checking Event Queue...
WARN Event Queue overloaded!

This means the full log of both the current result and a historic stickyresult would be shown (the timeout handling works similar already, if aHC times out the last available HC result is shown). The HC executor hasall necessary meta data (the time is recorded in the execution result)and this would be easy to add. The best about this is that you canchange the sticky time and the "stickiness" by configuration only - noredeployment needed :)


WDYT?

Best Regards
Georg

[1]https://issues.apache.org/jira/browse/SLING-6855?focusedCommentId=16010189&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16010189


For example, a quota has been tripped - warn for 30 minutes.

Or an events queue overflowed and the instance is considered damaged -
raise a critical alarm forever.

With the current SLING-6855 one can raise such alarms but they are all
grouped in a single health check - doing this results in that HC
having both A and B tags and returning two results:

  ResultRegistry reg = sling.getService(ResultRegistry.class)
  reg.put("testA", new Result(Result.Status.CRITICAL, "It's
critical"), null, "A");
  reg.put("testB", new Result(Result.Status.WARN, "B is just a
warning"), null, "B");

So if you query for tag B you get both results, although they areunrelated.


I would prefer creating one HC for each such alarm, and rename the
service StickyResults instead of ResultRegistry.

So the above example (with service interface renamed) would cause two
HCs to be created:

1) StickyResult (testA) ; status CRITICAL, message "it's critical", tagA2) StickyResult (testB) ; status WARN, message "B is just a warning",tag B


The HCs are keyed based on the "identifier" parameter, so in the above
example putting another "testB" overwrites the existing one.

Clint and others, WDYT?

-Bertrand

Re: ResultRegistry for Health Checks -> StickyResults instead?

Reply via email to