José Correia created SLING-11181:
------------------------------------
Summary: Emit metrics that distinguish transient and permanent
distribution failures
Key: SLING-11181
URL: https://issues.apache.org/jira/browse/SLING-11181
Project: Sling
Issue Type: Improvement
Components: Content Distribution
Reporter: José Correia
h3. Context
Currently, our error metrics don't distinguish between distribution failures
that are permanent and will fail even if retried, or failures that succeed
after being retried.
We want to improve this in order to be able to differentiate both scenarios.
h3. Solution
Failure metric should be labeled by:
* {{Transient failure}}
* {{Permanent failure}}
h3. Proposed approach
We can distinguish both these scenarios by using the following rationale:
* Transient failures happen whenever a package is distributed successfully but
had more than 1 attempt at being distributed: {{retries > 0}}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)