Hi Matthew,

To make a simplistic comparison, it is generally not recommended to raid 5
with large disks (>1 TB) due to the probability (low but not zero) of
losing another disk during the rebuild.
So imagine losing a host full of disks.

Additionally, min_size=1 means you can no longer maintain your cluster
(update, etc.), it's dangerous.

Unless you can afford to lose/rebuild your cluster, you should never have a
min_size <2
________________________________________________________

Cordialement,

*David CASIER*

________________________________________________________



Le mar. 5 déc. 2023 à 10:03, duluxoz <dulu...@gmail.com> a écrit :

> Thanks David, I knew I had something wrong  :-)
>
> Just for my own edification: Why is k=2, m=1 not recommended for
> production? Considered to "fragile", or something else?
>
> Cheers
>
> Dulux-Oz
>
> On 05/12/2023 19:53, David Rivera wrote:
> > First problem here is you are using crush-failure-domain=osd when you
> > should use crush-failure-domain=host. With three hosts, you should use
> > k=2, m=1; this is not recommended in  production environment.
> >
> > On Mon, Dec 4, 2023, 23:26 duluxoz <dulu...@gmail.com> wrote:
> >
> >     Hi All,
> >
> >     Looking for some help/explanation around erasure code pools, etc.
> >
> >     I set up a 3-node Ceph (Quincy) cluster with each box holding 7 OSDs
> >     (HDDs) and each box running Monitor, Manager, and iSCSI Gateway.
> >     For the
> >     record the cluster runs beautifully, without resource issues, etc.
> >
> >     I created an Erasure Code Profile, etc:
> >
> >     ~~~
> >     ceph osd erasure-code-profile set my_ec_profile plugin=jerasure
> >     k=4 m=2
> >     crush-failure-domain=osd
> >     ceph osd crush rule create-erasure my_ec_rule my_ec_profile
> >     ceph osd crush rule create-replicated my_replicated_rule default host
> >     ~~~
> >
> >     My Crush Map is:
> >
> >     ~~~
> >     # begin crush map
> >     tunable choose_local_tries 0
> >     tunable choose_local_fallback_tries 0
> >     tunable choose_total_tries 50
> >     tunable chooseleaf_descend_once 1
> >     tunable chooseleaf_vary_r 1
> >     tunable chooseleaf_stable 1
> >     tunable straw_calc_version 1
> >     tunable allowed_bucket_algs 54
> >
> >     # devices
> >     device 0 osd.0 class hdd
> >     device 1 osd.1 class hdd
> >     device 2 osd.2 class hdd
> >     device 3 osd.3 class hdd
> >     device 4 osd.4 class hdd
> >     device 5 osd.5 class hdd
> >     device 6 osd.6 class hdd
> >     device 7 osd.7 class hdd
> >     device 8 osd.8 class hdd
> >     device 9 osd.9 class hdd
> >     device 10 osd.10 class hdd
> >     device 11 osd.11 class hdd
> >     device 12 osd.12 class hdd
> >     device 13 osd.13 class hdd
> >     device 14 osd.14 class hdd
> >     device 15 osd.15 class hdd
> >     device 16 osd.16 class hdd
> >     device 17 osd.17 class hdd
> >     device 18 osd.18 class hdd
> >     device 19 osd.19 class hdd
> >     device 20 osd.20 class hdd
> >
> >     # types
> >     type 0 osd
> >     type 1 host
> >     type 2 chassis
> >     type 3 rack
> >     type 4 row
> >     type 5 pdu
> >     type 6 pod
> >     type 7 room
> >     type 8 datacenter
> >     type 9 zone
> >     type 10 region
> >     type 11 root
> >
> >     # buckets
> >     host ceph_1 {
> >        id -3            # do not change unnecessarily
> >        id -4 class hdd  # do not change unnecessarily
> >        # weight 38.09564
> >        alg straw2
> >        hash 0  # rjenkins1
> >        item osd.0 weight 5.34769
> >        item osd.1 weight 5.45799
> >        item osd.2 weight 5.45799
> >        item osd.3 weight 5.45799
> >        item osd.4 weight 5.45799
> >        item osd.5 weight 5.45799
> >        item osd.6 weight 5.45799
> >     }
> >     host ceph_2 {
> >        id -5            # do not change unnecessarily
> >        id -6 class hdd  # do not change unnecessarily
> >        # weight 38.09564
> >        alg straw2
> >        hash 0  # rjenkins1
> >        item osd.7 weight 5.34769
> >        item osd.8 weight 5.45799
> >        item osd.9 weight 5.45799
> >        item osd.10 weight 5.45799
> >        item osd.11 weight 5.45799
> >        item osd.12 weight 5.45799
> >        item osd.13 weight 5.45799
> >     }
> >     host ceph_3 {
> >        id -7            # do not change unnecessarily
> >        id -8 class hdd  # do not change unnecessarily
> >        # weight 38.09564
> >        alg straw2
> >        hash 0  # rjenkins1
> >        item osd.14 weight 5.34769
> >        item osd.15 weight 5.45799
> >        item osd.16 weight 5.45799
> >        item osd.17 weight 5.45799
> >        item osd.18 weight 5.45799
> >        item osd.19 weight 5.45799
> >        item osd.20 weight 5.45799
> >     }
> >     root default {
> >        id -1            # do not change unnecessarily
> >        id -2 class hdd  # do not change unnecessarily
> >        # weight 114.28693
> >        alg straw2
> >        hash 0  # rjenkins1
> >        item ceph_1 weight 38.09564
> >        item ceph_2 weight 38.09564
> >        item ceph_3 weight 38.09564
> >     }
> >
> >     # rules
> >     rule replicated_rule {
> >        id 0
> >        type replicated
> >        step take default
> >        step chooseleaf firstn 0 type host
> >        step emit
> >     }
> >     rule my_replicated_rule {
> >        id 1
> >        type replicated
> >        step take default
> >        step chooseleaf firstn 0 type host
> >        step emit
> >     }
> >     rule my_ec_rule {
> >        id 2
> >        type erasure
> >        step set_chooseleaf_tries 5
> >        step set_choose_tries 100
> >        step take default
> >        step choose indep 3 type host
> >        step chooseleaf indep 2 type osd
> >        step emit
> >     }
> >
> >     # end crush map
> >     ~~~
> >
> >     Finally I create a pool:
> >
> >     ~~~
> >     ceph osd pool create my_pool 32 32 erasure my_ec_profile my_ec_rule
> >     ceph osd pool application enable my_meta_pool rbd
> >     rbd pool init my_meta_pool
> >     rbd pool init my_pool
> >     rbd create --size 16T my_pool/my_disk_1 --data-pool my_pool
> >     --image-feature journaling
> >     ~~~
> >
> >     So all this is to have some VMs (oVirt VMs, for the record) with
> >     automatic fall-over in the case of a Ceph Node loss - ie I was
> >     trying to
> >     "replicate" a 3-Disk RAID 5 array across the Ceph Nodes, so that I
> >     could
> >     loose a Node and still have a working set of VMs.
> >
> >     However, I took one of the Ceph Nodes down (gracefully) for some
> >     maintenance the other day and I lost *all* the VMs (ie oVirt
> >     complained
> >     that there was no active pool). As soon as I brought the down node
> >     back
> >     up everything was good again.
> >
> >     So my question is: What did I do wrong with my config?
> >
> >     Sound I, for example, change the EC Profile to `k=2, m=1`, but how is
> >     that practically different from `k=4, m=2` - yes, the later
> >     spreads the
> >     pool over more disks, but it should still only put 2 disks on each
> >     node,
> >     shouldn't it?
> >
> >     Thanks in advance
> >
> >     Cheers
> >
> >     Dulux-Oz
> >     _______________________________________________
> >     ceph-users mailing list -- ceph-users@ceph.io
> >     To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to