Hi,
I have 5 data nodes (bluestore, kraken), each with 24 OSDs.
I enabled the optimal crush tunables.
I'd like to try to "really" use EC pools, but until now I've faced cluster
lockups when I was using 3+2 EC pools with a host failure domain.
When a host was down for instance ;)
Since I'd like the erasure codes to be more than a "nice to have feature with
12+ ceph data nodes", I wanted to try this :
- Use a 14+6 EC rule
- And for each data chunk:
o select 4 hosts
o On these hosts, select 5 OSDs
In order to do that, I created this rule in the crush map :
rule 4hosts_20shards {
ruleset 3
type erasure
min_size 20
max_size 20
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step choose indep 4 type host
step chooseleaf indep 5 type osd
step emit
}
I then created an EC pool with this erasure profile :
ceph osd erasure-code-profile set erasurep14_6_osd ruleset-failure-domain=osd
k=14 m=6
I hoped this would allow for loosing 1 host completely without locking the
cluster, and I have the impression this is working..
But. There's always a but ;)
I tried to make all OSDs down by stopping the ceph-osd daemons on one node.
And according to ceph, the cluster is unhealthy.
The ceph health detail fives me for instance this (for the 3+2 and 14+6 pools) :
pg 5.18b is active+undersized+degraded, acting [57,47,2147483647,23,133]
pg 9.186 is active+undersized+degraded, acting
[2147483647,2147483647,2147483647,2147483647,2147483647,133,142,125,131,137,50,48,55,65,52,16,13,18,22,3]
My question therefore is : why aren't the down PGs remapped onto my 5th data
node since I made sure the 20 EC shards were spread onto 4 hosts only ?
I thought/hoped that because osds were down, the data would be rebuilt onto
another OSD/host ?
I can understand the 3+2 EC pool cannot allocate OSDs on another host because
the 3+2=5 hosts already, but I don't understand why the 14+6 EC pool/pgs do not
rebuild somewhere else ?
I do not find anything worth in a "ceph pg query", the up and acting parts are
equal and do contain the 2147483647 value (wich means none as far as I
understood).
I've also tried to "ceph osd out" all the OSDs from one host : in that case,
the 3+2 EC PGs behaves as previously, but the 14+6 EC PGs seem happy despite
the fact they are still saying the out OSDs are up and acting.
Is my crush rule that wrong ?
Is it possible to do what I want ?
Thanks for any hints...
Regards
Frederic
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com