Christian Schneider created SLING-8447:
------------------------------------------

             Summary: Provide current-retries metric for journaled distribution
                 Key: SLING-8447
                 URL: https://issues.apache.org/jira/browse/SLING-8447
             Project: Sling
          Issue Type: New Feature
          Components: Content Distribution
    Affects Versions: Content Distribution Journal Core 0.1.0
            Reporter: Christian Schneider
             Fix For: Content Distribution Journal Core 0.1.2


For operating a sling system with content distribution it is important to 
detect when a publisher is stuck. 

A good indicator for this is if the same package is retried for more than a 
certain number of times.

Currently there only is an absolute metric of failed packages. When doing a 
derivation of that total metric it is possible to detect a growing number of 
failed packages. Unfortunately you can not distinguish between one package 
being retried 10 times and 10 packages being retried once each.

So I propose to create a new metric of current-retries as a gauge. This metric 
reports how often the current package is retried. So it grows while the same 
package is retried and resets to 0 when the package is successfully applied or 
when the server is restarted.

With this metric it is very easy to detect a blocked publisher as you simply 
need to check if the metric exceeds a limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to