Hi Jan,

Using mclock scheduler you can do the following:

set osd_mclock_override_recovery_settings to true e.g.

~ # ceph config set osd osd_mclock_override_recovery_settings true

and then increase max backfills like

~ # ceph config set osd osd_max_backfills 5

It will instantly increase backfills. I suggest starting with a low number
of max backfills and monitor how your cluster behaves before increasing
further.

Important: When your backfill and recovery is done lower max backfill again.

 ~ # ceph config set osd osd_max_backfills 2

and set overwrite back to false

~ # ceph config set osd osd_mclock_override_recovery_settings false

Cheers

Stephan

Am Mi., 8. Okt. 2025 um 07:08 Uhr schrieb Kirby Haze <[email protected]
>:

> I will assume you are running the default mclock scheduler and not wpq. I'm
> not too familiar with tuning mclock settings but this is the docs to look
> at
>
> https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#recovery-backfill-options
>
> osd_max_backfills is set to 1 by default and this is the first thing I
> would tune if you want faster backfilling.
>
> I would mainly look at this setting first before looking into the various
> knobs mclock provides
>
> https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#confval-osd_mclock_profile
>
> I use wpq and I have two main levers for backfilling
> osd_max_backfills - higher is faster backfilling
> osd_recovery_sleep + any other sleep setting -  throttles recovery ops
>
> mclock doesn't use the sleep configs so I'm not too sure the various knobs
> mclock has but the docs above have some good options to tweak. I would
> probably try to experiment with the different mclock profiles to see if it
> speeds up backfilling like the high recovery ops setting.
>
> https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#high-recovery-ops
>
>
>
> On Tue, Oct 7, 2025 at 2:20 AM Jan Kasprzak <[email protected]> wrote:
>
> >         Hello, Ceph users,
> >
> > on my new cluster which I filled with testing data two weeks ago
> > there are many repmapped PGs in backfill_wait state, probably as result
> > of autoscaling the number of PGs per pool. But the recovery speed
> > is quite low, in order of small MB/s and < 10 obj/s according to ceph -s.
> >
> > The cluster is otherwise idle, no client traffic after initial import,
> > so I wonder why the backfill does not progress faster. Also, it seems
> like
> > more pgs are getting remapped as existing ones get successfully
> backfilled
> > - the percentage of misplaced objects is steadily around 6 % for the last
> > two weeks.
> >
> > The PGs waiting for backfill all belong to the biggest pool I have
> > according to "ceph pg dump | grep backfill", no surprise here.
> > The pool has 229 TB of data and currently 128 PGs. It is replicated
> > with k=4 m=2. The second biggest pool has only 23 TB of data:
> >
> > rados df
> > POOL_NAME               USED   OBJECTS  CLONES    COPIES
> > MISSING_ON_PRIMARY  UNFOUND  DEGRADED  RD_OPS       RD     WR_OPS
>  WR
> > USED COMPR  UNDER COMPR
> > pool_with_backfill   229 TiB  10086940       0  60521640
> >  0        0         0       0      0 B   72545009   54 TiB         0 B
> >     0 B
> > second_biggest_pool   23 TiB   1153174       0   6919044
> >  0        0         0       0      0 B   38506397   16 TiB         0 B
> >     0 B
> > [...]
> >
> > I tried to do "ceph osd pool force-backfill $pool", it helped to speed
> > things up a bit, but it still runs at 50-200 MB/s and 4-20 obj/s.
> > The initial data import ran at around 600 MB/s.
> >
> > Is it normal or can I speed the recovery up a bit somehow?
> >
> > Output of ceph -s:
> >
> >   cluster:
> >     id:     ...
> >     health: HEALTH_WARN
> >             2 large omap objects
> >
> >   services:
> >     mon: 3 daemons, quorum istor11,istor21,istor31 (age 13d)
> >     mgr: istor31(active, since 3w), standbys: istor21, istor11
> >     osd: 36 osds: 36 up (since 2w), 36 in (since 3w); 14 remapped pgs
> >
> >   data:
> >     pools:   45 pools, 1505 pgs
> >     objects: 13.39M objects, 198 TiB
> >     usage:   303 TiB used, 421 TiB / 724 TiB avail
> >     pgs:     5335074/80345832 objects misplaced (6.640%)
> >              1449 active+clean
> >              34   active+clean+scrubbing
> >              11   active+remapped+backfill_wait+forced_backfill
> >              8    active+clean+scrubbing+deep
> >              2    active+remapped+forced_backfill
> >              1    active+remapped+backfilling+forced_backfill
> >
> >   io:
> >     recovery: 69 MiB/s, 4 objects/s
> >
> > The OSDs are HDD-based with metadata on NVMe, 4 OSDs per node,
> > and all the nodes have load average somewhere between 0.3 and 0.6.
> >
> > Thanks!
> >
> > -Yenya
> >
> > --
> > | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net -
> private}>
> > |
> > | https://www.fi.muni.cz/~kas/                        GPG:
> 4096R/A45477D5
> > |
> >     We all agree on the necessity of compromise. We just can't agree on
> >     when it's necessary to compromise.                     --Larry Wall
> > _______________________________________________
> > ceph-users mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> >
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to