Gehel has uploaded a new change for review. ( 
https://gerrit.wikimedia.org/r/346710 )

Change subject: maps - increase number of retries before alert for posttgresql 
lag check
......................................................................

maps - increase number of retries before alert for posttgresql lag check

This should reduce the number of false positive pending more investigation
into the root cause of this.

Bug: T162345
Change-Id: I13eef098a586ac770b76ecfa707ff2c4a4aaa045
---
M modules/role/manifests/maps/slave.pp
1 file changed, 7 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/10/346710/1

diff --git a/modules/role/manifests/maps/slave.pp 
b/modules/role/manifests/maps/slave.pp
index b0ae6d7..1a9787f 100644
--- a/modules/role/manifests/maps/slave.pp
+++ b/modules/role/manifests/maps/slave.pp
@@ -17,8 +17,15 @@
     $warning = 300
     $command = "/usr/lib/nagios/plugins/check_postgres_replication_lag.py \
 -U replication -P ${replication_pass} -m ${master} -D template1 -C ${critical} 
-W ${warning}"
+
+    # This check generate a number of alerts, which recover quickly. It looks
+    # like lag suddenly jumps from 0 to a high number (multiple hours) and goes
+    # back to zero quickly. Increasing the number of retries will reduce the
+    # number of false positive while we investigate a better solution. See
+    # T162345 for details.
     nrpe::monitor_service { 'postgres-rep-lag':
         description  => 'Postgres Replication Lag',
         nrpe_command => $command,
+        retries      => 10,
     }
 }

-- 
To view, visit https://gerrit.wikimedia.org/r/346710
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I13eef098a586ac770b76ecfa707ff2c4a4aaa045
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Gehel <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to