We observe strange behavior with some configurations. PGs stays in degraded 
state after
a single OSD failure. 

I can also show the behavior using crushtool  with the following map:

----------crush map---------
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 root

# buckets
host prox-ceph-1 {
        id -2           # do not change unnecessarily
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 1
        item osd.1 weight 1
        item osd.2 weight 1
        item osd.3 weight 1
}
host prox-ceph-2 {
        id -3           # do not change unnecessarily
        # weight 7.260
        alg straw
        hash 0  # rjenkins1
        item osd.4 weight 1
        item osd.5 weight 1
        item osd.6 weight 1
        item osd.7 weight 1
}
host prox-ceph-3 {
        id -4           # do not change unnecessarily
        alg straw
        hash 0  # rjenkins1
        item osd.8 weight 1
        item osd.9 weight 1
        item osd.10 weight 1
        item osd.11 weight 1
}

root default {
        id -1           # do not change unnecessarily
        # weight 21.780
        alg straw
        hash 0  # rjenkins1
        item prox-ceph-1 weight 4
        item prox-ceph-2 weight 4
        item prox-ceph-3 weight 4
}

# rules
rule data {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map
--------------------------------------

After compiling that map to ''crush-test.map' we run:

# crushtool --test -i 'crush-test.map' --rule 0 --num-rep 3 --weight 11 0 
--show-statistics

I set '--weight 11 0' to mark osd.11 as 'out'. The result is:

...
rule 0 (data) num_rep 3 result size == 2:       111/1024
rule 0 (data) num_rep 3 result size == 3:       913/1024

so 111 PG end up in degraded state. I would expect that the data gets 
re-distributed to the remaining OSDs instead.

Can someone explain why that happens?



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to