Use a crush rule likes this for replica:

1) root default class XXX
2) choose 2 rooms
3) choose 2 disks

That'll get you 4 OSDs in two rooms and the first 3 of these get data,
the fourth will be ignored. That guarantees that losing a room will
lose you at most 2 out of 3 copies. This is for disaster recovery
only, it'll guarantee durability if you lose a room but not
availability.

3+2 erasure coding cannot be split across two rooms in this way
because, well, you need 3 out of 5 shards to survive, so you cannot
lose half of them.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Thu, Nov 28, 2019 at 5:40 PM Francois Legrand <f...@lpnhe.in2p3.fr> wrote:
>
> Hi,
> I have a cephfs in production based on 2 pools (data+metadata).
>
> Data is  in erasure coding with the profile :
> crush-failure-domain=host
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=3
> m=2
> plugin=jerasure
> technique=reed_sol_van
> w=8
>
> Metadata is in replicated mode with k=3
>
> The crush rules are as follow :
> [
>      {
>          "rule_id": 0,
>          "rule_name": "replicated_rule",
>          "ruleset": 0,
>          "type": 1,
>          "min_size": 1,
>          "max_size": 10,
>          "steps": [
>              {
>                  "op": "take",
>                  "item": -1,
>                  "item_name": "default"
>              },
>              {
>                  "op": "chooseleaf_firstn",
>                  "num": 0,
>                  "type": "host"
>              },
>              {
>                  "op": "emit"
>              }
>          ]
>      },
>      {
>          "rule_id": 1,
>          "rule_name": "ec_data",
>          "ruleset": 1,
>          "type": 3,
>          "min_size": 3,
>          "max_size": 5,
>          "steps": [
>              {
>                  "op": "set_chooseleaf_tries",
>                  "num": 5
>              },
>              {
>                  "op": "set_choose_tries",
>                  "num": 100
>              },
>              {
>                  "op": "take",
>                  "item": -1,
>                  "item_name": "default"
>              },
>              {
>                  "op": "chooseleaf_indep",
>                  "num": 0,
>                  "type": "host"
>              },
>              {
>                  "op": "emit"
>              }
>          ]
>      }
> ]
>
> When we installed it, everything was in the same room, but know we
> splitted our cluster (6 servers but soon 8) in 2 rooms. Thus we updated
> the crushmap by adding a room layer (with ceph osd crush add-bucket
> room1 room etc)  and move all our servers in the tree to the correct
> place (ceph osd crush move server1 room=room1 etc...).
>
> Now, we would like to change the rules to set a failure domain to room
> instead of host (to be sure that in case of disaster in one of the rooms
> we will still have a copy in the other).
>
> What is the best strategy to do this ?
>
> F.
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to