I’d say the cure is worse than the issue you’re trying to fix, but that’s my 
two cents.

Mark Schouten

> Op 24 jul. 2019 om 21:22 heeft Wido den Hollander <w...@42on.com> het 
> volgende geschreven:
> 
> Hi,
> 
> Is anybody using 4x (size=4, min_size=2) replication with Ceph?
> 
> The reason I'm asking is that a customer of mine asked me for a solution
> to prevent a situation which occurred:
> 
> A cluster running with size=3 and replication over different racks was
> being upgraded from 13.2.5 to 13.2.6.
> 
> During the upgrade, which involved patching the OS as well, they
> rebooted one of the nodes. During that reboot suddenly a node in a
> different rack rebooted. It was unclear why this happened, but the node
> was gone.
> 
> While the upgraded node was rebooting and the other node crashed about
> 120 PGs were inactive due to min_size=2
> 
> Waiting for the nodes to come back, recovery to finish it took about 15
> minutes before all VMs running inside OpenStack were back again.
> 
> As you are upgraded or performing any maintenance with size=3 you can't
> tolerate a failure of a node as that will cause PGs to go inactive.
> 
> This made me think about using size=4 and min_size=2 to prevent this
> situation.
> 
> This obviously has implications on write latency and cost, but it would
> prevent such a situation.
> 
> Is anybody here running a Ceph cluster with size=4 and min_size=2 for
> this reason?
> 
> Thank you,
> 
> Wido
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to