I setup a simple Ceph cluster with 5 OSD nodes and 1 monitor node. Each OSD is
on a different host.
The erasure coded pool has 64 PGs and an initial state of HEALTH_OK.
The goal is to deliberately break as many OSDs as possible up to the number of
coding chunks m in order to
evaluate the read performance when these chunks are missing. Per definition of
Reed-Solomon Coding, any
chunks out of the n=k+m total chunks can be missing. To simulate the loss of an
OSD I’m doing the following:
ceph osd set noup
ceph osd down <ID>
ceph osd out <ID>
With the above procedure I should be able to kill up to m = 3 OSDs without
loosing any data. However, when I kill k = 3 randomly selected OSDs,
all requests to the cluster are blocked and HEALTH_ERR is showing. The OSD on
which the requests are blocked is working properly and [in,up] in the cluster.
My question: Why is it not possible to kill m = 3 OSDs and still operate the
cluster? Isn’t that equivalent to loosing data which
shouldn’t happen in this particular configuration? Is my cluster setup properly
or am I missing something?
Thank you for your help!
I have attached all relevant information about the cluster and status outputs:
Erasure coding profile:
jerasure-per-chunk-alignment=false
k=2
m=3
plugin=jerasure
ruleset-failure-domain=host
ruleset-root=default
technique=reed_sol_van
w=8
Content of ceph.conf:
[global]
fsid = 6353b831-22c3-424c-a8f1-495788e6b4e2
mon_initial_members = ip-172-31-27-142
mon_host = 172.31.27.142
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_pool_default_min_size = 2
osd_pool_default_size = 2
mon_allow_pool_delete = true
Crush rule:
rule ecpool {
ruleset 1
type erasure
min_size 2
max_size 5
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
Output of 'ceph -s‘ while cluster is degraded:
cluster 6353b831-22c3-424c-a8f1-495788e6b4e2
health HEALTH_ERR
38 pgs are stuck inactive for more than 300 seconds
26 pgs degraded
38 pgs incomplete
26 pgs stuck degraded
38 pgs stuck inactive
64 pgs stuck unclean
26 pgs stuck undersized
26 pgs undersized
2 requests are blocked > 32 sec
recovery 3/5 objects degraded (60.000%)
recovery 1/5 objects misplaced (20.000%)
noup flag(s) set
monmap e2: 1 mons at {ip-172-31-27-142=172.31.27.142:6789/0}
election epoch 6, quorum 0 ip-172-31-27-142
mgr no daemons active
osdmap e194: 5 osds: 2 up, 2 in; 64 remapped pgs
flags noup,sortbitwise,require_jewel_osds,require_kraken_osds
pgmap v970: 64 pgs, 1 pools, 592 bytes data, 1 objects
79668 kB used, 22428 MB / 22505 MB avail
3/5 objects degraded (60.000%)
1/5 objects misplaced (20.000%)
38 incomplete
15 active+undersized+degraded
11 active+undersized+degraded+remapped
Output of 'ceph health‘ while cluster is degraded:
HEALTH_ERR 38 pgs are stuck inactive for more than 300 seconds; 26 pgs
degraded; 38 pgs incomplete; 26 pgs stuck degraded; 38 pgs stuck inactive; 64
pgs stuck unclean; 26 pgs stuck undersized; 26 pgs undersized; 2 requests are
blocked > 32 sec; recovery 3/5 objects degraded (60.000%); recovery 1/5 objects
misplaced (20.000%); noup flag(s) set_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com