Faidon Liambotis has uploaded a new change for review. https://gerrit.wikimedia.org/r/120774
Change subject: twmeproxy: monitor the memcache port as well ...................................................................... twmeproxy: monitor the memcache port as well We just had a complicated failure scenario where the root cause was that twemproxy had stopped listening on the 11211 port but kept running and listening on the 22222 management socket. twemproxy didn't log anything anywhere and strace/gdb didn't reveal more, so this remains a puzzle for now. In the meantime, add an NRPE check to monitor 11211 as well so that at least we can be alerted if/when that happens again. Change-Id: I09c0fce64160e6d58f2a85599dd17c57e4aa923d --- M manifests/role/applicationserver.pp 1 file changed, 4 insertions(+), 0 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/operations/puppet refs/changes/74/120774/1 diff --git a/manifests/role/applicationserver.pp b/manifests/role/applicationserver.pp index d11d434..9aabe97 100644 --- a/manifests/role/applicationserver.pp +++ b/manifests/role/applicationserver.pp @@ -70,6 +70,10 @@ description => "twemproxy process", nrpe_command => "/usr/lib/nagios/plugins/check_procs -c 1:1 -u nobody -C nutcracker" } + nrpe::monitor_service { 'twemproxy port': + description => 'twemproxy port', + nrpe_command => '/usr/lib/nagios/plugins/check_tcp -H 127.0.0.1 -p 11211 --timeout=2', + } } if $::realm == 'labs' { -- To view, visit https://gerrit.wikimedia.org/r/120774 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I09c0fce64160e6d58f2a85599dd17c57e4aa923d Gerrit-PatchSet: 1 Gerrit-Project: operations/puppet Gerrit-Branch: production Gerrit-Owner: Faidon Liambotis <fai...@wikimedia.org> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits