Question is then why this is happening only on s3 cluster and not also on
NO s3 cluster while write io during recovery/backfilling is actually much
higher than on s3?

On Tue, Feb 3, 2026 at 3:27 PM Eugen Block via ceph-users <
[email protected]> wrote:

> I concur with Robert's statement, having HDDs only could explain what
> you're describing. Not sure where you get your number from (10 DB devices
> max per NVMe), but the docs [0] state to not have more than 15 OSDs per
> NVMe:
>
> > DB/WAL offload (optional)
>
> 1x SSD partition per HDD OSD 4-5x HDD OSDs per DB/WAL SATA SSD <= 15 HDD
> OSDs per DB/WAL NVMe SSD
> But you're correct about the SPOF, if one NVMe dies, all OSDs that have
> their DB/WAL on that NVMe die as well.
>
> [0]
>
> https://docs.ceph.com/en/latest/start/hardware-recommendations/#minimum-hardware-recommendations
>
> Am Di., 3. Feb. 2026 um 15:00 Uhr schrieb Rok Jaklič via ceph-users <
> [email protected]>:
>
> > We have 28 OSDs per host and we can only have 2 NVMe per host (one being
> > used for OS) ... and if I remember correctly there is max 10 OSDs/NVMe
> > recommended, that's why we decided to go just for HDD based clusters at
> the
> > beinging.
> >
> > We have 2 clusters this way, one being for HPC (no radosgw/s3) and other
> > for "users" (radosgw/s3), running over 4 years now ... works ok,
> > performance is ok, just we have this problem where we have to do gentle
> > reweight of a failed OSDs.
> >
> > Thanks for the info, we will consider NVMe ... although then there is
> SPOF
> > for those OSDs which have DB on NVMe?
> >
> > Rok
> >
> > On Tue, Feb 3, 2026 at 2:35 PM Robert Sander via ceph-users <
> > [email protected]> wrote:
> >
> > > Am 03.02.26 um 2:31 PM schrieb Rok Jaklič:
> > > > On Tue, Feb 3, 2026 at 2:26 PM Robert Sander via ceph-users <ceph-
> > > > [email protected] <mailto:[email protected]>> wrote:
> > > >
> > > >>     Am 03.02.26 um 2:18 PM schrieb Rok Jaklič via ceph-users:
> > > >>
> > > >>      >              2 OSD(s) experiencing slow operations in
> BlueStore
> > > >>      >              2 OSD(s) experiencing stalled read in db device
> of
> > > >>     BlueFS
> > > >>
> > > >>     Are your OSDs HDD only?
> > > >
> > >
> > > > Yes.
> > > >
> > > > Does not affect users much. Usually those messages appear when we are
> > > > reweighting and changing failed disks.
> > >
> > > These HDDs will be maxed out with the recovery work and cannot serve
> > > anything else any more.
> > >
> > > I have seen HDD only clusters going into the "spiral of death" because
> > > the HDDs cannot answer fast enough. OSDs randomly dropping out making
> > > the whole system unstable.
> > >
> > > The RocksDB is such a random IO application that it is not suitable for
> > > HDDs. It should always be put on flash storage (SSD/NVMe).
> > >
> > > Regards
> > > --
> > > Robert Sander
> > > Linux Consultant
> > >
> > > Heinlein Consulting GmbH
> > > Schwedter Str. 8/9b, 10119 Berlin
> > >
> > > https://www.heinlein-support.de
> > >
> > > Tel: +49 30 405051 - 0
> > > Fax: +49 30 405051 - 19
> > >
> > > Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> > > Geschäftsführer: Peer Heinlein - Sitz: Berlin
> > > _______________________________________________
> > > ceph-users mailing list -- [email protected]
> > > To unsubscribe send an email to [email protected]
> > >
> > _______________________________________________
> > ceph-users mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> >
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to