Ottomata has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/177662

Change subject: Move Kafka MessagesIn anomaly alert to 
role::graphite::production
......................................................................

Move Kafka MessagesIn anomaly alert to role::graphite::production

This computed on a sumSeries, so does not belong on each of the Kafka brokers.
The definition in kafka.pp will be removed after ensure => absent
removes them from icinga.

Change-Id: I897afa9223b2e6031547f84ca30a0644f1b03cd9
---
M manifests/role/analytics/kafka.pp
M manifests/role/graphite.pp
2 files changed, 17 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/62/177662/1

diff --git a/manifests/role/analytics/kafka.pp 
b/manifests/role/analytics/kafka.pp
index 12d4726..66b4018 100644
--- a/manifests/role/analytics/kafka.pp
+++ b/manifests/role/analytics/kafka.pp
@@ -210,6 +210,8 @@
 
     # Use graphite's anomaly detection support.
     monitoring::graphite_anomaly { 'kafka-broker-MessagesIn-anomaly':
+        # moving this to role::graphite::production since it is not a node 
based metric.
+        ensure       => 'absent',
         description  => 'Kafka Broker Messages In Per Second',
         metric       => 
'sumSeries(kafka.*.kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.OneMinuteRate.value)',
         # check over the 60 data points (an hour?) and:
diff --git a/manifests/role/graphite.pp b/manifests/role/graphite.pp
index 85cc2e6..d1cbf1b 100644
--- a/manifests/role/graphite.pp
+++ b/manifests/role/graphite.pp
@@ -237,6 +237,21 @@
         check_window => 100,
         over         => true
     }
+
+    # Use graphite's anomaly detection support.
+    monitoring::graphite_anomaly { 'kafka-broker-MessagesIn-anomaly':
+        description  => 'Kafka Broker Messages In Per Second',
+        metric       => 
'sumSeries(kafka.*.kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.OneMinuteRate.value)',
+        # check over the 60 data points (an hour?) and:
+        # - alert warn if more than 30 are under the confidence band
+        # - alert critical if more than 45 are under the confidecne band
+        check_window => 60,
+        warning      => 30,
+        critical     => 45,
+        under        => true,
+        require      => Class['::kafka::server::jmxtrans'],
+        group        => $nagios_servicegroup,
+    }
 }
 
 # == Class: role::graphite::labmon

-- 
To view, visit https://gerrit.wikimedia.org/r/177662
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I897afa9223b2e6031547f84ca30a0644f1b03cd9
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Ottomata <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to