Tim Starling has submitted this change and it was merged.

Change subject: Monitor MediaWiki fatals and exceptions in Ganglia
......................................................................


Monitor MediaWiki fatals and exceptions in Ganglia

This change extends the EventLogging Puppet module to configure Ganglia
monitoring of MediaWiki fatals and exceptions.

Change I1632a6b19 configured fluorine to forward MediaWiki fatals and
exceptions to vanadium via UDP port 8423. This change configures an
EventLogging UDP-to-ZMQ router that publishes the same stream using ZeroMQ on
TCP 8423. (ZeroMQ facilitates having multiple subscribers consume the stream;
UDP with SO_REUSEADDR does not work well with unicast.)

This change also sets up a metric gathering module that reports errors (broken
down by type) to Ganglia. Error types are detected using simple substring
matching.

Port 8423 is hard-coded in three places (twice in this change, once in
I1632a6b19), which is unfortunate. Instead of plopping static configuration
files in /etc/supervisor, the EventLogging Puppet module should declare
parametrized resource types for common patterns, like UDP-to-ZMQ forwarding.
I intend to do this sometime in the next month or two.

Change-Id: I55450783d018ed7fd7399ee5adf4305af156a59b
---
A modules/eventlogging/files/mwerrors.conf
A modules/eventlogging/files/mwerrors.py
A modules/eventlogging/files/mwerrors.pyconf
M modules/eventlogging/manifests/init.pp
A modules/eventlogging/manifests/mediawiki_errors.pp
5 files changed, 214 insertions(+), 0 deletions(-)

Approvals:
  Tim Starling: Verified; Looks good to me, approved
  jenkins-bot: Verified



diff --git a/modules/eventlogging/files/mwerrors.conf 
b/modules/eventlogging/files/mwerrors.conf
new file mode 100644
index 0000000..2982121
--- /dev/null
+++ b/modules/eventlogging/files/mwerrors.conf
@@ -0,0 +1,10 @@
+; Supervisord configuration for 'mwerrors' Ganglia module.
+; Managed by Puppet: puppet:///files/eventlogging/mwerrors.conf
+; Forward MediaWiki fatals / exceptions to ZeroMQ
+
+[group:mwerrors]
+programs = udp2zmq_8423
+
+[program:udp2zmq_8423]
+command = udp2zmq 8423
+user = eventlogging
diff --git a/modules/eventlogging/files/mwerrors.py 
b/modules/eventlogging/files/mwerrors.py
new file mode 100755
index 0000000..1871722
--- /dev/null
+++ b/modules/eventlogging/files/mwerrors.py
@@ -0,0 +1,122 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+"""
+  Gmond metric-gathering module for MediaWiki fatals and exceptions
+
+  Reads fatals / exceptions from a ZeroMQ publisher. MediaWiki logs to a file
+  or a UDP socket, so for this to work you will also need a UDP-to-ZMQ router.
+  See 'udp2zmq' in EventLogging.
+
+  When invoked by itself, runs a self-test.
+
+  Usage: mwerrors.py tcp://HOST:PORT
+
+  Written by Ori Livneh <[email protected]>
+
+"""
+import sys
+reload(sys)
+sys.setdefaultencoding('utf8')
+
+import errno
+import threading
+import time
+
+import zmq
+
+
+patterns = (
+    # Substring to match                     # Metric      # Metric title
+    ('Fatal error: Out of memory',           'oom',        'Out-of-memory 
fatals'),
+    ('Fatal error: Maximum execution time',  'timelimit',  'Time limit 
fatals'),
+    ('Fatal error:',                         'fatal',      'Miscellaneous 
fatals'),
+    ('Exception from',                       'exception',  'Exceptions'),
+    ('Catchable fatal error',                'catchable',  'Catchable fatals'),
+    ('DatabaseBase->reportQueryError',       'query',      'Query errors'),
+)
+
+
+def count_errors(counter, endpoint):
+    """Count error types in error stream."""
+    ctx = zmq.Context.instance()
+    sock = ctx.socket(zmq.SUB)
+    sock.connect(endpoint)
+    sock.setsockopt(zmq.SUBSCRIBE, b'')
+
+    while 1:
+        try:
+            line = sock.recv()
+            for pattern, name, description in patterns:
+                if pattern in line:
+                    counter[name] += 1
+                    break
+        except zmq.ZMQError as e:
+            # Calls interrupted by EINTR should be re-tried.
+            if e.errno == errno.EINTR:
+                continue
+            raise
+
+
+def metric_init(params):
+    """
+    Initialize; part of Gmond interface
+
+    `params` is a dictionary of configuration options, generated by
+    Ganglia out of values specified in the module's .pyconf file. It
+    should contain an 'endpoint' key, specifying the address of the
+    streaming endpoint. Example:
+
+        param endpoint {
+            value = 'tcp://127.0.0.1:8423'
+        }
+
+    """
+    endpoint = params['endpoint']
+    counter = {name: 0 for pattern, name, description in patterns}
+
+    thread = threading.Thread(target=count_errors, args=(counter, endpoint))
+    thread.daemon = True
+    thread.start()
+
+    time.sleep(2)
+
+    return [{
+        'name': name,
+        'value_type': 'uint',
+        'format': '%d',
+        'units': 'errors',
+        'slope': 'positive',
+        'time_max': 15,
+        'description': description,
+        'groups': 'mediawiki',
+        'call_back': counter.get,
+    } for pattern, name, description in patterns]
+
+
+def metric_cleanup():
+    """Teardown; part of Gmond interface"""
+    pass
+
+
+if __name__ == '__main__':
+    # Self-test: report metrics to stdout every 10 seconds.
+    import sys
+
+    if len(sys.argv) != 2:
+        sys.exit('Usage: %s tcp://HOST:PORT' % __file__)
+
+    params = {'endpoint': sys.argv[1]}
+    metrics = metric_init(params)
+
+    print('Streaming errors from %(endpoint)s...' % params)
+
+    while 1:
+        print('\n{:-^32}'.format(time.asctime()))
+        for metric in metrics:
+            call_back = metric['call_back']
+            name = metric['name']
+            description = metric['description']
+            print('{:.<30}{}'.format(description, call_back(name)))
+        time.sleep(10)
+
+# vim: set et ft=python ts=4 sw=4:
diff --git a/modules/eventlogging/files/mwerrors.pyconf 
b/modules/eventlogging/files/mwerrors.pyconf
new file mode 100644
index 0000000..6435a29
--- /dev/null
+++ b/modules/eventlogging/files/mwerrors.pyconf
@@ -0,0 +1,49 @@
+/**
+ * MediaWiki exceptions & fatals monitoring
+ * File managed by Puppet: puppet:///files/eventlogging/mwerrors.pyconf
+ */
+
+modules {
+  module {
+    name = "mwerrors"
+    language = "python"
+  }
+}
+
+
+collection_group {
+
+  collect_every = 15
+  time_threshold = 30
+
+  metric {
+      name = "oom"
+      title = "Out-of-memory fatals"
+      value_threshold = 1
+  }
+  metric {
+      name = "timelimit"
+      title = "Time limit fatals"
+      value_threshold = 1
+  }
+  metric {
+      name = "fatal"
+      title = "Miscellaneous fatals"
+      value_threshold = 1
+  }
+  metric {
+      name = "exception"
+      title = "Exceptions"
+      value_threshold = 1
+  }
+  metric {
+      name = "catchable"
+      title = "Catchable fatals"
+      value_threshold = 1
+  }
+  metric {
+      name = "query"
+      title = "Query errors"
+      value_threshold = 1
+  }
+}
diff --git a/modules/eventlogging/manifests/init.pp 
b/modules/eventlogging/manifests/init.pp
index f585681..1797394 100644
--- a/modules/eventlogging/manifests/init.pp
+++ b/modules/eventlogging/manifests/init.pp
@@ -14,6 +14,8 @@
                bind_ip => false,
        }
 
+       class { 'eventlogging::mediawiki_errors': }
+
        package { [
                'python-jsonschema',
                'python-mysqldb',
diff --git a/modules/eventlogging/manifests/mediawiki_errors.pp 
b/modules/eventlogging/manifests/mediawiki_errors.pp
new file mode 100644
index 0000000..2870b84
--- /dev/null
+++ b/modules/eventlogging/manifests/mediawiki_errors.pp
@@ -0,0 +1,31 @@
+# Monitor MediaWiki errors using Ganglia
+class eventlogging::mediawiki_errors {
+
+       file { '/usr/lib/ganglia/python_modules/mwerrors.py':
+               ensure  => present,
+               source  => 'puppet:///modules/eventlogging/mwerrors.py',
+               require => [
+                       File['/usr/lib/ganglia/python_modules'],
+                       Package['python-zmq'],
+               ],
+       }
+
+       file { '/etc/supervisor/conf.d/mwerrors.conf':
+               source  => 'puppet:///modules/eventlogging/mwerrors.conf',
+               require => [ Package['supervisor'], Systemuser['eventlogging'] 
],
+               notify  => Service['supervisor'],
+               mode    => '0444',
+       }
+
+       file { '/etc/ganglia/conf.d/mwerrors.pyconf':
+               ensure   => present,
+               source   => 'puppet:///modules/eventlogging/mwerrors.pyconf',
+               require  => [
+                       File['/etc/ganglia/conf.d'],
+                       File['/usr/lib/ganglia/python_modules/mwerrors.py'],
+                       File['/etc/supervisor/conf.d/mwerrors.conf'],
+               ],
+               notify   => Service[gmond],
+       }
+
+}

-- 
To view, visit https://gerrit.wikimedia.org/r/59059
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I55450783d018ed7fd7399ee5adf4305af156a59b
Gerrit-PatchSet: 2
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Ori.livneh <[email protected]>
Gerrit-Reviewer: Hashar <[email protected]>
Gerrit-Reviewer: Lcarr <[email protected]>
Gerrit-Reviewer: MZMcBride <[email protected]>
Gerrit-Reviewer: Ori.livneh <[email protected]>
Gerrit-Reviewer: Reedy <[email protected]>
Gerrit-Reviewer: Tim Starling <[email protected]>
Gerrit-Reviewer: jenkins-bot

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to