We don't use many critical alerts (that will have our NOC wake up an engineer),
but the main one that we do have is a check that tells us if there are 2 or
more hosts with osds that are down. We have clusters with 60 servers in them,
so having an osd die and backfill off of isn't something to wake up for in the
middle of the night, but having osds down on 2 servers is 1 osd away from data
loss. A quick reference to how to do this check in bash is below.
hosts_with_down_osds=`ceph osd tree | grep 'host\|down' | grep -B1 down | grep
host | wc -l`
if [ $hosts_with_down_osds -ge 2 ]
then
echo critical
elif [ $hosts_with_down_osds -eq 1 ]
then
echo warning
elif [ $hosts_with_down_osds -eq 0 ]
then
echo ok
else
echo unknown
fi
________________________________
[cid:[email protected]]<https://storagecraft.com> David
Turner | Cloud Operations Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943
________________________________
If you are not the intended recipient of this message or received it
erroneously, please notify the sender and delete it, together with any
attachments, and be advised that any dissemination or copying of this message
is prohibited.
________________________________
________________________________
From: ceph-users [[email protected]] on behalf of Chris Jones
[[email protected]]
Sent: Friday, January 13, 2017 1:15 PM
To: [email protected]
Subject: [ceph-users] Ceph Monitoring
General question/survey:
Those that have larger clusters, how are you doing alerting/monitoring?
Meaning, do you trigger off of 'HEALTH_WARN', etc? Not really talking about
collectd related but more on initial alerts of an issue or potential issue?
What threshold do you use basically? Just trying to get a pulse of what others
are doing.
Thanks in advance.
--
Best Regards,
Chris Jones
Bloomberg
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com