Ottomata has submitted this change and it was merged. (
https://gerrit.wikimedia.org/r/328239 )
Change subject: Alert on EventBus service HTTP error rate
......................................................................
Alert on EventBus service HTTP error rate
Bug: T153034
Change-Id: Id8701a8ef08512488bd316b8b34872980dfa6cfe
---
M modules/role/manifests/graphite/alerts.pp
1 file changed, 11 insertions(+), 0 deletions(-)
Approvals:
Ottomata: Verified; Looks good to me, approved
Ppchelko: Looks good to me, but someone else must approve
diff --git a/modules/role/manifests/graphite/alerts.pp
b/modules/role/manifests/graphite/alerts.pp
index d44f4c8..2d1bc4e 100644
--- a/modules/role/manifests/graphite/alerts.pp
+++ b/modules/role/manifests/graphite/alerts.pp
@@ -55,5 +55,16 @@
from => '10min',
percentage => 70,
}
+
+ # Monitor EventBus 4xx and 5xx HTTP response rate.
+ monitoring::graphite_threshold { 'eventbus_http_error_rate':
+ description => 'EventBus HTTP Error Rate (4xx + 5xx)',
+ metric =>
'transformNull(sumSeries(eventbus.counters.eventlogging.service.EventHandler.POST.[45]*.rate))',
+ # If > 50% of datapoints over last 10 minutes is over thresholds, then
alert.
+ warning => 1,
+ critical => 10,
+ from => '10min',
+ percentage => 50,
+ }
}
--
To view, visit https://gerrit.wikimedia.org/r/328239
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Id8701a8ef08512488bd316b8b34872980dfa6cfe
Gerrit-PatchSet: 3
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Ottomata <[email protected]>
Gerrit-Reviewer: Ottomata <[email protected]>
Gerrit-Reviewer: Ppchelko <[email protected]>
Gerrit-Reviewer: jenkins-bot <>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits