Exactly, we minimize the blast radius/data destruction by allocating
more devices for DB/WAL of smaller size than less of larger size. We
encountered this same issue on an earlier iteration of our hardware
design. With rotational drives and NVMEs, we are now aiming for a 6:1
ratio based on our CRUSH rules/rotational disk sizing/nvme
sizing/server sizing/EC setup/etc.

Make sure to use write-friendly NVMEs for DB/WAL and the failures
should be much fewer and further between.

On Thu, Sep 9, 2021 at 9:11 AM Janne Johansson <icepic...@gmail.com> wrote:
>
> Den tors 9 sep. 2021 kl 16:09 skrev Michal Strnad <michal.str...@cesnet.cz>:
> >  When the disk with DB died
> > it will cause inaccessibility of all depended OSDs (six or eight in our
> > environment),
> > How do you do it in your environment?
>
> Have two ssds for 8 OSDs, so only half go away when one ssd dies.
>
> --
> May the most significant bit of your life be positive.
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to