Hi,

Thanks.

> ofcourse the 4 osd's left working now want to selfheal by recreating all 
> objects stored on the 4 split off osd's and have a huge recovery job. and you 
> may risk that the osd's goes into too_full error, unless you have free space 
> in your osd's to recreate all the data in the defective part of the cluster. 
> or they will be stuck in recovery mode until you get the second room running, 
> this depends on your crush map.

Means we've to made 4 OSD machines sufficient space to hold all data and thus 
the usable space will be halved?

> point in that slitting the cluster hurts. and if HA is the most important 
> then you may  want to check out rbd mirror.

Will consider when there is budget to setup another ceph cluster for rdb mirror.

Thanks a lot.
Rgds
/st  wong

From: ceph-users [mailto:[email protected]] On Behalf Of Ronny 
Aasen
Sent: Thursday, March 29, 2018 4:51 PM
To: [email protected]
Subject: Re: [ceph-users] split brain case

On 29.03.2018 10:25, ST Wong (ITSC) wrote:
Hi all,

We put 8 (4+4) OSD and 5 (2+3) MON servers in server rooms in 2 buildings for 
redundancy.  The buildings are connected through direct connection.
While servers in each building have alternate uplinks.   What will happen in 
case the link between the buildings is broken (application servers in each 
server room will continue to write to OSDs in the same room) ?

Thanks a lot.
Rgds
/st wong



my guesstimate is that the serverroom with 3 mons will retain quorum, and 
continue operation. the room with 2 mon's will notice they are split out and 
block.
assuming you have 3+2 pools and one of the objects is allways on the other 
server room. some pg's will be active becouse you have 2 objects on the working 
room.  but some pg's will be inactive until they can selfheal and backfill a 
second copy of the objects.
i assume you could have 4+2 replication to avoid this issue.

ofcourse the 4 osd's left working now want to selfheal by recreating all 
objects stored on the 4 split off osd's and have a huge recovery job. and you 
may risk that the osd's goes into too_full error, unless you have free space in 
your osd's to recreate all the data in the defective part of the cluster. or 
they will be stuck in recovery mode until you get the second room running, this 
depends on your crush map.

if you really need to split a cluster into separate rooms, i would have used 3 
rooms, with redundant data paths between them. primary path between room A and 
C is direct. redundant path is via A-B-C. this should reduce the disaster if a 
single path is broken.
with 1 mon in each room. you can loose a whole room to powerloss, and still 
have a working cluster.  and you would only need 33% instead of 50%  cluster 
capacity as free space in your cluster to be able to selfheal

point in that slitting the cluster hurts. and if HA is the most important then 
you may  want to check out rbd mirror.



kind
Ronny Aasen
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to