I setup a simple Ceph cluster with 5 OSD nodes and 1 monitor node. Each OSD is 
on a different host.
The erasure coded pool has 64 PGs and an initial state of HEALTH_OK.

The goal is to deliberately break as many OSDs as possible up to the number of 
coding chunks m in order to 
evaluate the read performance when these chunks are missing. Per definition of 
Reed-Solomon Coding, any
chunks out of the n=k+m total chunks can be missing. To simulate the loss of an 
OSD I’m doing the following:

ceph osd set noup
ceph osd down <ID>
ceph osd out <ID>

With the above procedure I should be able to kill up to m = 3 OSDs without 
loosing any data. However, when I kill k = 3 randomly selected OSDs, 
all requests to the cluster are blocked and HEALTH_ERR is showing. The OSD on 
which the requests are blocked is working properly and [in,up] in the cluster.

My question: Why is it not possible to kill m = 3 OSDs and still operate the 
cluster? Isn’t that equivalent to loosing data which
shouldn’t happen in this particular configuration? Is my cluster setup properly 
or am I missing something?

Thank you for your help!

I have attached all relevant information about the cluster and status outputs:

Erasure coding profile:

jerasure-per-chunk-alignment=false
k=2
m=3
plugin=jerasure
ruleset-failure-domain=host
ruleset-root=default
technique=reed_sol_van
w=8

Content of ceph.conf:

[global]
fsid = 6353b831-22c3-424c-a8f1-495788e6b4e2
mon_initial_members = ip-172-31-27-142
mon_host = 172.31.27.142
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_pool_default_min_size = 2
osd_pool_default_size = 2
mon_allow_pool_delete = true

Crush rule:

rule ecpool {
        ruleset 1
        type erasure
        min_size 2
        max_size 5
        step set_chooseleaf_tries 5
        step set_choose_tries 100
        step take default
        step chooseleaf indep 0 type host
        step emit
}

Output of 'ceph -s‘ while cluster is degraded:

    cluster 6353b831-22c3-424c-a8f1-495788e6b4e2
     health HEALTH_ERR
            38 pgs are stuck inactive for more than 300 seconds
            26 pgs degraded
            38 pgs incomplete
            26 pgs stuck degraded
            38 pgs stuck inactive
            64 pgs stuck unclean
            26 pgs stuck undersized
            26 pgs undersized
            2 requests are blocked > 32 sec
            recovery 3/5 objects degraded (60.000%)
            recovery 1/5 objects misplaced (20.000%)
            noup flag(s) set
     monmap e2: 1 mons at {ip-172-31-27-142=172.31.27.142:6789/0}
            election epoch 6, quorum 0 ip-172-31-27-142
        mgr no daemons active
     osdmap e194: 5 osds: 2 up, 2 in; 64 remapped pgs
            flags noup,sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v970: 64 pgs, 1 pools, 592 bytes data, 1 objects
            79668 kB used, 22428 MB / 22505 MB avail
            3/5 objects degraded (60.000%)
            1/5 objects misplaced (20.000%)
                  38 incomplete
                  15 active+undersized+degraded
                  11 active+undersized+degraded+remapped

Output of 'ceph health‘ while cluster is degraded:

HEALTH_ERR 38 pgs are stuck inactive for more than 300 seconds; 26 pgs 
degraded; 38 pgs incomplete; 26 pgs stuck degraded; 38 pgs stuck inactive; 64 
pgs stuck unclean; 26 pgs stuck undersized; 26 pgs undersized; 2 requests are 
blocked > 32 sec; recovery 3/5 objects degraded (60.000%); recovery 1/5 objects 
misplaced (20.000%); noup flag(s) set
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to