Christian Schneider created SLING-8447:
------------------------------------------
Summary: Provide current-retries metric for journaled distribution
Key: SLING-8447
URL: https://issues.apache.org/jira/browse/SLING-8447
Project: Sling
Issue Type: New Feature
Components: Content Distribution
Affects Versions: Content Distribution Journal Core 0.1.0
Reporter: Christian Schneider
Fix For: Content Distribution Journal Core 0.1.2
For operating a sling system with content distribution it is important to
detect when a publisher is stuck.
A good indicator for this is if the same package is retried for more than a
certain number of times.
Currently there only is an absolute metric of failed packages. When doing a
derivation of that total metric it is possible to detect a growing number of
failed packages. Unfortunately you can not distinguish between one package
being retried 10 times and 10 packages being retried once each.
So I propose to create a new metric of current-retries as a gauge. This metric
reports how often the current package is retried. So it grows while the same
package is retried and resets to 0 when the package is successfully applied or
when the server is restarted.
With this metric it is very easy to detect a blocked publisher as you simply
need to check if the metric exceeds a limit.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)