On Wed, Feb 28, 2018 at 3:02 AM Zoran Bošnjak <
[email protected]> wrote:

> I am aware of monitor consensus requirement. It is taken care of (there is
> a third room with only monitor node). My problem is about OSD redundancy,
> since I can only use 2 server rooms for OSDs.
>
> I could use EC-pools, lrc or any other ceph configuration. But I could not
> find a configuration that would address the issue. The write acknowledge
> rule should read something like this:
> 1. If both rooms are "up", do not acknowledge write until ack is received
> from both rooms.
> 2. If only one room is "up" (forget rule 1.) acknowledge write on the
> first ack.


This check is performed when PGs go active, not on every write, (once a PG
goes active, it needs a commit from everybody in the set before writes are
done, or else to go through peering again) but that is the standard
behavior for Ceph if you configure CRUSH to place data redundantly in both
rooms.


>
> The ceph documentation talks about recursively defined locality sets, so I
> assume it allows for different rules on room/rack/host... levels.
> But as far as I can see, it can not depend on "room" availability.
>
> Is this possible to configure?
> I would appreciate example configuration commands.
>
> regards,
> Zoran
>
> ________________________________________
> From: Eino Tuominen <[email protected]>
> Sent: Wednesday, February 28, 2018 8:47 AM
> To: Zoran Bošnjak; [email protected]
> Subject: Re: mirror OSD configuration
>
> > Is it possible to configure crush map such that it will tolerate "room"
> failure? In my case, there is one
> > network switch per room and one power supply per room, which makes a
> single point of (room) failure.
>
> Hi,
>
> You cannot achieve real room redundancy with just two rooms. At minimum
> you'll need a third room (witness) from which you'll need independent
> network connections to the two server rooms. Otherwise it's impossible to
> have monitor quorum when one of the two rooms fails. And then you'd need to
> consider osd redundancy. You could do with replica size = 4, min_size = 2
> (or any min_size = n, size = 2*n ), but that's not perfect as you lose
> exactly half of the replicas in case of a room failure. If you were able to
> use EC-pools you'd have more options with LRC coding (
> http://docs.ceph.com/docs/master/rados/operations/erasure-code-lrc/).
>
> We run ceph in a 3 room configuration with 3 monitors, size=3, min_size=2.
> It works, but it's not without hassle either.
>
> --
>   Eino Tuominen
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to