[
https://issues.apache.org/jira/browse/SLING-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010189#comment-16010189
]
Bertrand Delacretaz commented on SLING-6855:
--------------------------------------------
bq. Could it be easier/better to just record metrics?
I like the idea, especially if it means no changes to the health check APIs.
Let me try to reformulate to see if I understand your idea, with some
additional tentative details:
* An HC service which has a {{hc.registermeters}} or maybe
{{hc.generatemetrics}} property causes metrics on its results to be generated
via [1]
* Such HCs are executed at regular intervals to compute these metrics. The
interval in seconds might be the value of that {{hc.generatemetrics}} property.
* HCs can be configured to watch these metrics and complain if values are out
of range.
* When out of range values are flagged, the alarm remains valid for a
configurable amount of time. How that exactly happens is to be defined
[1] https://sling.apache.org/documentation/bundles/metrics.html
> Create ResultRegistry to provide health check behavior for executing code
> that does not want a HealthCheck
> ----------------------------------------------------------------------------------------------------------
>
> Key: SLING-6855
> URL: https://issues.apache.org/jira/browse/SLING-6855
> Project: Sling
> Issue Type: New Feature
> Components: Health Check
> Reporter: Clinton H Goudie-Nice
>
> I want to provide a Registry service that can be leveraged to provide health
> check results.
> These results can be for a period of time through an expiration, until the
> JVM is restarted, or added and later removed.
> This can be useful when code observes a specific (possibly bad) state, and
> wants to alert through the health check API that this state has taken place.
> Some examples:
> An event pool has filled, and some events will be thrown away.
> This is a failure case that requires a restart of the instance.
> It would be appropriate to trigger a permanent failure.
>
> A quota has been tripped. This quota may immediately recover, but it is
> sensible to alert for 30 minutes that the quota has been tripped.
> If you expect the failure will clear itself within a certain window, setting
> the expiration to that window can be ideal.
> GHPR to follow
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)