Is there an equivalent of 'ceph health' but for OSD ?

Like warning about slowness or troubles with communication between OSDs?

I've spent good amount of time debugging what looked like stuck pgs
only but it turned out to be bad NIC and it was only apparent once I
saw some OSD logs like

2016-02-08 03:42:27.810289 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no 
reply from osd.14 ever on either front or back, first ping sent 2016-02-08 
03:39:24.860852 (cutoff 2016-02-08 03:39:27.810288)
2016-02-08 03:42:27.810297 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no 
reply from osd.15 ever on either front or back, first ping sent 2016-02-08 
03:39:24.860852 (cutoff 2016-02-08 03:39:27.810288)
2016-02-08 03:42:28.311125 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no 
reply from osd.14 ever on either front or back, first ping sent 2016-02-08 
03:39:24.860852 (cutoff 2016-02-08 03:39:28.311124)

(turned out to be bad nic, fuck emulex)

is there anything that could dump things like "failed heartbeats in
last 10 minutes"  or similiar stats ?

-- 
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. WoĊ‚oska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: [email protected]
<mailto:[email protected]>

Attachment: pgpm9EJE00Ovh.pgp
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to