Hi all,

Tested 4 cases.  Case 1-3 are as expected, while for case 4,   rebuild didn’t 
take place on surviving room as Gregory mentioned.  Repeated case 4 several 
times on both rooms got same result.  We’re running mimic 13.2.2.

E.g.

Room1
Host 1 osd: 2,5
Host 2 osd: 1,3

Room 2  <-- failed room
Host 3 osd: 0,4
Host 4 osd: 6,7


Before:
5.62          0                  0        0         0       0         0    0    
    0 active+clean 2019-02-12 04:47:28.183375            0'0      3643:2299   
[0,7,5]          0   [0,7,5]              0            0'0 2019-02-12 
04:47:28.183218             0'0 2019-02-11 01:20:51.276922             0

After:
5.62          0                  0        0         0       0         0    0    
    0          undersized+peered 2019-02-12 09:10:59.101096            0'0      
3647:2284   [5]          5    [5]              5            0'0 2019-02-12 
04:47:28.183218             0'0 2019-02-11 01:20:51.276922             0

Fyi.   Sorry for the belated report.

Thanks a lot.
/st


From: Gregory Farnum <[email protected]>
Sent: Monday, November 26, 2018 9:27 PM
To: ST Wong (ITSC) <[email protected]>
Cc: [email protected]
Subject: Re: [ceph-users] will crush rule be used during object relocation in 
OSD failure ?

On Fri, Nov 23, 2018 at 11:01 AM ST Wong (ITSC) 
<[email protected]<mailto:[email protected]>> wrote:

Hi all,



We've 8 osd hosts, 4 in room 1 and 4 in room2.

A pool with size = 3 using following crush map is created, to cater for room 
failure.


rule multiroom {
        id 0
        type replicated
        min_size 2
        max_size 4
        step take default
        step choose firstn 2 type room
        step chooseleaf firstn 2 type host
        step emit
}




We're expecting:

1.for each object, there are always 2 replicas in one room and 1 replica in 
other room making size=3.  But we can't control which room has 1 or 2 replicas.

Right.


2.in<http://2.in> case an osd host fails, ceph will assign remaining osds to 
the same PG to hold replicas on the failed osd host.  Selection is based on 
crush rule of the pool, thus maintaining the same failure domain - won't make 
all replicas in the same room.

Yes, if a host fails the copies it held will be replaced by new copies in the 
same room.


3.in<http://3.in> case of entire room with 1 replica fails, the pool will 
remain degraded but won't do any replica relocation.

Right.


4. in case of entire room with 2 replicas fails, ceph will make use of osds in 
the surviving room and making 2 replicas.  Pool will not be writeable before 
all objects are made 2 copies (unless we make pool size=4?).  Then when 
recovery is complete, pool will remain in degraded state until the failed room 
recover.

Hmm, I'm actually not sure if this will work out — because CRUSH is 
hierarchical, it will keep trying to select hosts from the dead room and will 
fill out the location vector's first two spots with -1. It could be that Ceph 
will skip all those "nonexistent" entries and just pick the two copies from 
slots 3 and 4, but it might not. You should test this carefully and report back!
-Greg

Is our understanding correct?  Thanks a lot.
Will do some simulation later to verify.

Regards,
/stwong
_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to