Re: [ceph-users] will crush rule be used during object relocation in OSD failure ?

Eugen Block Tue, 12 Feb 2019 01:32:50 -0800

Hi,

I came to the same conclusion after doing various tests with rooms andfailure domains. I agree with Maged and suggest to use size=4,min_size=2 for replicated pools. It's more overhead but you cansurvive the loss of one room and even one more OSD (of the affectedPG) without losing data. You'll also have the certainty that there arealways two replicas per room, no guessing or hoping which room is morelikely to fail.


If the overhead is too high could EC be an option for your setup?

Regards,
Eugen


Zitat von "ST Wong (ITSC)" <[email protected]>:

Hi all,
Tested 4 cases. Case 1-3 are as expected, while for case 4,rebuild didn’t take place on surviving room as Gregory mentioned.Repeated case 4 several times on both rooms got same result. We’rerunning mimic 13.2.2.
E.g.

Room1
Host 1 osd: 2,5
Host 2 osd: 1,3

Room 2  <-- failed room
Host 3 osd: 0,4
Host 4 osd: 6,7


Before:
5.62 0 0 0 0 00 0 0 active+clean 2019-02-12 04:47:28.1833750'0 3643:2299 [0,7,5] 0 [0,7,5] 00'0 2019-02-12 04:47:28.183218 0'0 2019-02-1101:20:51.276922 0
After:
5.62 0 0 0 0 00 0 0 undersized+peered 2019-02-1209:10:59.101096 0'0 3647:2284 [5] 5[5] 5 0'0 2019-02-12 04:47:28.1832180'0 2019-02-11 01:20:51.276922 0
Fyi.   Sorry for the belated report.

Thanks a lot.
/st


From: Gregory Farnum <[email protected]>
Sent: Monday, November 26, 2018 9:27 PM
To: ST Wong (ITSC) <[email protected]>
Cc: [email protected]
Subject: Re: [ceph-users] will crush rule be used during objectrelocation in OSD failure ?
On Fri, Nov 23, 2018 at 11:01 AM ST Wong (ITSC)<[email protected]<mailto:[email protected]>> wrote:
Hi all,



We've 8 osd hosts, 4 in room 1 and 4 in room2.
A pool with size = 3 using following crush map is created, to caterfor room failure.
rule multiroom {
        id 0
        type replicated
        min_size 2
        max_size 4
        step take default
        step choose firstn 2 type room
        step chooseleaf firstn 2 type host
        step emit
}




We're expecting:
1.for each object, there are always 2 replicas in one room and 1replica in other room making size=3. But we can't control whichroom has 1 or 2 replicas.
Right.
2.in<http://2.in> case an osd host fails, ceph will assign remainingosds to the same PG to hold replicas on the failed osd host.Selection is based on crush rule of the pool, thus maintaining thesame failure domain - won't make all replicas in the same room.
Yes, if a host fails the copies it held will be replaced by newcopies in the same room.
3.in<http://3.in> case of entire room with 1 replica fails, the poolwill remain degraded but won't do any replica relocation.
Right.
4. in case of entire room with 2 replicas fails, ceph will make useof osds in the surviving room and making 2 replicas. Pool will notbe writeable before all objects are made 2 copies (unless we makepool size=4?). Then when recovery is complete, pool will remain indegraded state until the failed room recover.
Hmm, I'm actually not sure if this will work out — because CRUSH ishierarchical, it will keep trying to select hosts from the dead roomand will fill out the location vector's first two spots with -1. Itcould be that Ceph will skip all those "nonexistent" entries andjust pick the two copies from slots 3 and 4, but it might not. Youshould test this carefully and report back!
-Greg

Is our understanding correct?  Thanks a lot.
Will do some simulation later to verify.

Regards,
/stwong
_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] will crush rule be used during object relocation in OSD failure ?

Reply via email to