I tried yesterday, if I stop OSD and just set OSD to out (without
reweight), then radosgw does not stop/hangs and rebalance finishes, of
course with a little bit more traffic, ...

So, one should not reweight a failed disk, but just stop and set failed OSD
to out and leave ceph to rebalance.

Thx to all.

Rok

On Thu, Feb 5, 2026 at 1:08 AM Kirby Haze via ceph-users <[email protected]>
wrote:

> When reweight the drive down to 0, does the behavior change if you set it
> 'out' first?  The only other special thing that happens with reweight 0 is
> that upmaps get removed for said OSD.
>
> If you think it is at the pool level you could run a simple rados bench to
> a pool while you reweight an OSD to see if your issue still holds.
>
> Also could be some issue with osd_op_queue being mclock if you have that,
> might be worth A/B testing with one OSD if you still see it.
>
> On Wed, Feb 4, 2026 at 12:31 PM Rok Jaklič via ceph-users <
> [email protected]> wrote:
>
> > If one HDD drive fails and smartctl shows errors and then I decide to
> drain
> > it with crush reweight 0, would Ceph try to copy/move data/pgs from data
> > failed disk anyway?
> >
> > Because we noticed on some non ceph clusters (raid setup, actually mail
> > servers), that one failed drive may hog up "app/OS" because "app" fails
> to
> > read/write to failed disk because "some queue" fills up (since app/OS is
> > unable for data to be read/written)?
> >
> > Could something similar happen in Ceph?
> >
> > Rok
> >
> > On Wed, Feb 4, 2026 at 9:52 AM Rok Jaklič <[email protected]> wrote:
> >
> > >
> > > On Wed, Feb 4, 2026 at 2:59 AM Anthony D'Atri <[email protected]
> >
> > > wrote:
> > >
> > >> Are these rear bay drives, hence the limit of 2? Or
> > >> You might consider an M.2 AIC adapter card with bifurcation.  M.2
> > >> enterprise SSDs are sunsetting but for retrofits you should be able to
> > find
> > >> Micron 6450 units.
> > >>
> > >> What’s your workload like?
> > >>
> > >
> > > On average 10-50MB/s of write, with spikes up to a few hundred MB/s
> > during
> > > evening/night time; it went up to 1GB/s during tests without a problem.
> > All
> > > these are S3 workloads/tests.
> > >
> > > I would have to check that on site, RM does not show, however we are
> just
> > > about to migrate to new machines, which have 4 NVMe slots ... so I am
> > > really considering moving WAL/DB to NVMe, however I am still a little
> bit
> > > hesitant, since I am not really sure this will solve the problem of why
> > > radosgw/s3 stops after some time when setting crush reweight to 0 on
> one
> > > failed disk. We are doing the same thing on HPC where radosgw/s3 is not
> > > used and we are not experiencing this problem there. If we move WAL/DB
> to
> > > NVMe, and if one NVMe fails and we have to recover 10 OSDs for example,
> > it
> > > would take much longer than if just 1 OSD has to be recovered (while
> > users
> > > being unable to access s3).
> > >
> > > ---
> > >
> > > My suspicion is that when we set crush reweight of the failed disk to
> 0,
> > > all other affected disks from that pool disables some write (because of
> > > recovery) and some queue fills up which then stops/hangs radosgw...
> > >
> > > Rok
> > >
> > >
> > _______________________________________________
> > ceph-users mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> >
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to