Hi. I'm having problem with the OSD en my cluster.
Randomly some OSD get wrongly marked down. I set my "mon osd min down reporters " to OSD +1, but i still get this problem. Any tips or ideas to do the troubleshooting? I'm using Ceph 0.94.5 on Centos 7. The logs shows this: 2017-12-19 16:59:26.357707 7fa9177d3700 0 -- 172.17.4.2:6830/4775054 >> 172.17.4.3:6800/2009784 pipe(0x7fa8a0907000 sd=43 :45955 s=1 pgs=1089 cs=1 l=0 c=0x7fa8a0965f00).connect got RESETSESSION 2017-12-19 16:59:26.360240 7fa8e5652700 0 -- 172.17.4.2:6830/4775054 >> 172.17.4.1:6808/6007742 pipe(0x7fa9310e3000 sd=26 :53375 s=2 pgs=5272 cs=1 l=0 c=0x7fa931045680).fault, initiating reconnect 2017-12-19 16:59:25.716758 7fa8e74c1700 0 -- 172.17.4.2:6830/4775054 >> 172.17.4.1:6826/1007559 pipe(0x7fa907052000 sd=17 :45743 s=1 pgs=2105 cs=1 l=0 c=0x7fa8a051a180).connect got RESETSESSION 2017-12-19 16:59:25.716308 7fa9849ed700 0 -- 172.17.3.2:6802/3775054 submit_message osd_op_reply(392 rbd_data.129d2042eabc234.0000000000000605 [set-alloc-hint object_size 4194304 write_size 4194304,write 0~126976] v26497'18879046 uv18879046 ondisk = 0) v6 remote, 172.17.1.3:0/5911141, failed lossy con, dropping message 0x7fa8830edb00 2017-12-19 16:59:25.718694 7fa9849ed700 0 -- 172.17.3.2:6802/3775054 submit_message osd_op_reply(10610054 rbd_data.6ccd3348ab9aac.000000000000011d [set-alloc-hint object_size 8388608 write_size 8388608,write 876544~4096] v26497'15075797 uv15075797 ondisk = 0) v6 remote, 172.17.1.4:0/1028032, failed lossy con, dropping message 0x7fa87a911700 -- Sergio A. Morales Ingeniero de Sistemas LINETS CHILE - 56 2 2412 5858
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
