Hi,
I've got a problem with Ceph 0.60, when an OSD fails, it goes down, but
it never goes out, even after the down out interval. This behavior works
in 0.56.4.
When I stop an OSD, after the mon_osd_down_out_interval seconds, this
OSD is not set out of the cluster:
ceph osd tree
# id weight type name up/down reweight
-1 6 root default
-3 6 rack unknownrack
-2 2 host it-test-15.lab
0 1 osd.0 down 1
1 1 osd.1 up 1
-4 2 host it-test-16.lab
2 1 osd.2 up 1
3 1 osd.3 up 1
-5 2 host it-test-17.lab
4 1 osd.4 up 1
5 1 osd.5 up 1
If I manually set this OSD out with 'ceph osd out osd.0', the cluster
rebalance the data properly.
This affect Ceph 0.60. With v0.56.4, the OSD is set out after the down
out interval.
Did I miss something ? A new option in v0.60 ?
It's always occur on newly created cluster. I did not edit the crushmap
and I've unset noout flag just in case.
My ceph.conf:
[global]
auth cluster required = none
auth service required = none
auth client required = none
[mon]
mon osd down out interval = 60
[osd]
osd mkfs type = xfs
osd mkfs options xfs = -f -i size=2048
osd mount options xfs = inode64,noatime
[mon.a]
host = it-test-8.lab
mon addr = 192.168.32.200:6789
[osd.0]
host = it-test-15.lab
devs = /dev/sda3
[osd.1]
host = it-test-15.lab
devs = /dev/sdb1
[osd.2]
host = it-test-16.lab
devs = /dev/sda3
[osd.3]
host = it-test-16.lab
devs = /dev/sdb1
[osd.4]
host = it-test-17.lab
devs = /dev/sda3
[osd.5]
host = it-test-17.lab
devs = /dev/sdb1
My running conf about down_out:
ceph --admin-daemon /var/run/ceph/ceph-mon.a.asok config show | grep _out_
"mon_osd_adjust_down_out_interval": "true",
"mon_osd_auto_mark_auto_out_in": "true",
"mon_osd_down_out_interval": "60",
"mon_osd_down_out_subtree_limit": "rack",
Thanks.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com