[jira] [Commented] (SLING-12262) Repoinit: report failures via metrics

Joerg Hoh (Jira) Thu, 07 Mar 2024 03:18:28 -0800


    [ 
https://issues.apache.org/jira/browse/SLING-12262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824365#comment-17824365
 ]


Joerg Hoh commented on SLING-12262:
-----------------------------------

We scrape metrics via prometheus and have an alertmanager instance to create 
alerts from it. If now an instance is not starting up, it's much easier to find 
out if repoinit is the culprit if you can query a metric than to search the 
logs for the characteristical exception of repoinit. That allows us to refine 
the "instance-not-starting-up" alert and convert it into an 
"instance-not-starting-up-because-of-repoinit-issues" alert, which is much more 
meaningful and which can be handled differently than the generic alert, which 
always requires the general triage process.







> Repoinit: report failures via metrics
> -------------------------------------
>
>                 Key: SLING-12262
>                 URL: https://issues.apache.org/jira/browse/SLING-12262
>             Project: Sling
>          Issue Type: Task
>          Components: Repoinit
>    Affects Versions: Repoinit JCR 1.1.46
>            Reporter: Joerg Hoh
>            Priority: Major
>
> When a repoinit statement fails (and for that reason the SlingRepository 
> service cannot be started, repoinit should expose this as a metric.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (SLING-12262) Repoinit: report failures via metrics

Reply via email to