[ 
https://issues.apache.org/jira/browse/SLING-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010286#comment-16010286
 ] 

Georg Henzler commented on SLING-6855:
--------------------------------------

[~bdelacretaz] 

Yes, that pretty much describes the idea pretty well, maybe two things to add:

* I'm not sure if a simple boolean for  {{hc.generateMetrics}} would suffice or 
if we should make automated execution via interval configurable. I guess 
automatic execution would not hurt because of the caching that is in place, but 
setups that measure the metrics will most likely regularly call the HCs anyway 
(e.g. from a load balancer)

bq. HCs can be configured to watch these metrics and complain if values are out 
of range.

*  I would not do that in the same HC that declares  {{hc.generateMetrics}} (to 
ensure we have separate state/logs for current and historical information). Two 
options exist:
** a new generic HC that is capable of monitoring any metrics (using a service 
factory) 
** another property {{hc.monitorMetrics}} that dynamically registers a "monitor 
HC" at runtime
The second option is less configuration work, but maybe we need the first 
option anyway in the future to monitor other metrics that are not produced by 
HCs

> Create ResultRegistry to provide health check behavior for executing code 
> that does not want a HealthCheck
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SLING-6855
>                 URL: https://issues.apache.org/jira/browse/SLING-6855
>             Project: Sling
>          Issue Type: New Feature
>          Components: Health Check
>            Reporter: Clinton H Goudie-Nice
>
> I want to provide a Registry service that can be leveraged to provide health 
> check results.
> These results can be for a period of time through an expiration, until the 
> JVM is restarted, or added and later removed.
> This can be useful when code observes a specific (possibly bad) state, and 
> wants to alert through the health check API that this state has taken place.
>  Some examples: 
>  An event pool has filled, and some events will be thrown away.
>   This is a failure case that requires a restart of the instance.
>   It would be appropriate to trigger a permanent failure.
>    
>  A quota has been tripped. This quota may immediately recover, but it is 
> sensible to alert for 30 minutes that the quota has been tripped.
>  If you expect the failure will clear itself within a certain window, setting 
> the expiration to that window can be ideal.
> GHPR to follow



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to