Very interesting documentation about this subject is here: http://docs.ceph.com/docs/hammer/rados/configuration/mon-osd-interaction/
2016-12-22 12:26 GMT+01:00 Stéphane Klein <[email protected]>: > Hi, > > I have: > > * 3 mon > * 3 osd > > When I shutdown one osd, I work great: > > cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac > health HEALTH_WARN > 43 pgs degraded > 43 pgs stuck unclean > 43 pgs undersized > recovery 24/70 objects degraded (34.286%) > too few PGs per OSD (28 < min 30) > 1/3 in osds are down > monmap e1: 3 mons at {ceph-mon-1=172.28.128.2:6789/ > 0,ceph-mon-2=172.28.128.3:6789/0,ceph-mon-3=172.28.128.4:6789/0} > election epoch 10, quorum 0,1,2 ceph-mon-1,ceph-mon-2,ceph- > mon-3 > osdmap e22: 3 osds: 2 up, 3 in; 43 remapped pgs > flags sortbitwise,require_jewel_osds > pgmap v169: 64 pgs, 1 pools, 77443 kB data, 35 objects > 252 MB used, 1484 GB / 1484 GB avail > 24/70 objects degraded (34.286%) > 43 active+undersized+degraded > 21 active+clean > > But, when I shutdown 2 osd, Ceph Cluster don't see that second osd node is > down :( > > root@ceph-mon-1:/home/vagrant# ceph status > cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac > health HEALTH_WARN > clock skew detected on mon.ceph-mon-2 > pauserd,pausewr,sortbitwise,require_jewel_osds flag(s) set > Monitor clock skew detected > monmap e1: 3 mons at {ceph-mon-1=172.28.128.2:6789/ > 0,ceph-mon-2=172.28.128.3:6789/0,ceph-mon-3=172.28.128.4:6789/0} > election epoch 10, quorum 0,1,2 ceph-mon-1,ceph-mon-2,ceph- > mon-3 > osdmap e26: 3 osds: 2 up, 2 in > flags pauserd,pausewr,sortbitwise,require_jewel_osds > pgmap v203: 64 pgs, 1 pools, 77443 kB data, 35 objects > 219 MB used, 989 GB / 989 GB avail > 64 active+clean > > 2 osd up ! why ? > > root@ceph-mon-1:/home/vagrant# ping ceph-osd-1 -c1 > --- ceph-osd-1 ping statistics --- > 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms > > root@ceph-mon-1:/home/vagrant# ping ceph-osd-2 -c1 > --- ceph-osd-2 ping statistics --- > 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms > > root@ceph-mon-1:/home/vagrant# ping ceph-osd-3 -c1 > --- ceph-osd-3 ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 0.278/0.278/0.278/0.000 ms > > My configuration: > > ceph_conf_overrides: > global: > osd_pool_default_size: 2 > osd_pool_default_min_size: 1 > > Full Ansible configuration is here: https://github.com/harobed/ > poc-ceph-ansible/blob/master/vagrant-3mons-3osd/hosts/ > group_vars/all.yml#L11 > > What is my mistake? Is it Ceph bug? > > Best regards, > Stéphane > -- > Stéphane Klein <[email protected]> > blog: http://stephane-klein.info > cv : http://cv.stephane-klein.info > Twitter: http://twitter.com/klein_stephane > -- Stéphane Klein <[email protected]> blog: http://stephane-klein.info cv : http://cv.stephane-klein.info Twitter: http://twitter.com/klein_stephane
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
