[ceph-users] Is there a bug in backfill scheduling?

Frank Schilder Sat, 17 Dec 2022 02:54:47 -0800

Hi all,

there have been many reports about too slow backfill lately and mostly they 
seemed related to a problem with mclock ops scheduling n quincy. The hallmark 
was that backfill started fast and then slowed down a lot. I now make the same 
observation on an octopus cluster with wpq and it looks very suspicious of a 
problem with scheduling backfill operations. Here what I see:


We added 95 disks to a set of disks shared by 2 pools. This is about 8% of the 
total number of disks and they were distributed over all 12 OSD hosts. The 2 
pools are 8+2 and 8+3 EC fs-data pools. Initially the backfill was as fast as 
expected, but over the last day was really slow (compared with expectation). 
Only 33 PGs were backfilling. I have osd_max_backfills=3 and a simple estimate 
says there should be between 100 - 200 PGs backfilling.

To speed things up, I increased osd_max_backfills=5 and the number of 
backfilling PGs jumped right up to over 200. That's way more than the relative 
increase would warrant. Just to check, I set osd_max_backfills=3 again to see 
if the number of PGs goes back to about 30 again. But no! Now I have 142 PGs 
backfilling, as expected.

This looks very much like PGs eligible for backfill don't start or backfill 
reservations are removed for some reason. Can anyone help out here what might 
be the problem? I don't want to start a cron job to set osd_max_backfills up 
and down. There must be something else at play here. Output of ceph status and 
config set commands below.

The number of backfilling PGs is decreasing again and I would really like this 
to be stable by itself. To give an idea of the problem, we talk here about a 
rebalancing taking either 2 weeks or 2 months. That's not a bagatelle issue.

Thanks and best regards,
Frank

[root@gnosis ~]# ceph config dump | sed -e "s/  */ /g" | grep :hdd | grep 
osd_max_backfills
 osd class:hdd advanced osd_max_backfills 3 
[root@gnosis ~]# ceph status
  cluster:
    id:     ###
    health: HEALTH_OK
 
  services:
    mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 7d)
    mgr: ceph-25(active, since 10w), standbys: ceph-03, ceph-02, ceph-01, 
ceph-26
    mds: con-fs2:8 4 up:standby 8 up:active
    osd: 1260 osds: 1260 up (since 2d), 1260 in (since 2d); 6487 remapped pgs
 
  task status:
 
  data:
    pools:   14 pools, 25065 pgs
    objects: 1.49G objects, 2.8 PiB
    usage:   3.4 PiB used, 9.7 PiB / 13 PiB avail
    pgs:     2466697364/12910834502 objects misplaced (19.106%)
             18571 active+clean
             6453  active+remapped+backfill_wait
             34    active+remapped+backfilling
             7     active+clean+snaptrim
 
  io:
    client:   30 MiB/s rd, 221 MiB/s wr, 1.08k op/s rd, 1.54k op/s wr
    recovery: 1.0 GiB/s, 380 objects/s

[root@gnosis ~]# ceph config set osd/class:hdd osd_max_backfills 5
[root@gnosis ~]# ceph status
  cluster:
    id:     ###
    health: HEALTH_OK
 
  services:
    mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 7d)
    mgr: ceph-25(active, since 10w), standbys: ceph-03, ceph-02, ceph-01, 
ceph-26
    mds: con-fs2:8 4 up:standby 8 up:active
    osd: 1260 osds: 1260 up (since 2d), 1260 in (since 2d); 6481 remapped pgs
 
  task status:
 
  data:
    pools:   14 pools, 25065 pgs
    objects: 1.49G objects, 2.8 PiB
    usage:   3.4 PiB used, 9.7 PiB / 13 PiB avail
    pgs:     2466120124/12911195308 objects misplaced (19.101%)
             18574 active+clean
             6247  active+remapped+backfill_wait
             234   active+remapped+backfilling
             6     active+clean+snaptrim
             2     active+clean+scrubbing+deep
             2     active+clean+scrubbing
 
  io:
    client:   34 MiB/s rd, 236 MiB/s wr, 1.28k op/s rd, 2.03k op/s wr
    recovery: 6.4 GiB/s, 2.39k objects/s
 
[root@gnosis ~]# ceph config set osd/class:hdd osd_max_backfills 3
[root@gnosis ~]# ceph status
  cluster:
    id:     ###
    health: HEALTH_OK
 
  services:
    mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 7d)
    mgr: ceph-25(active, since 10w), standbys: ceph-03, ceph-02, ceph-01, 
ceph-26
    mds: con-fs2:8 4 up:standby 8 up:active
    osd: 1260 osds: 1260 up (since 2d), 1260 in (since 2d); 6481 remapped pgs
 
  task status:
 
  data:
    pools:   14 pools, 25065 pgs
    objects: 1.49G objects, 2.8 PiB
    usage:   3.4 PiB used, 9.7 PiB / 13 PiB avail
    pgs:     2465974875/12911218789 objects misplaced (19.099%)
             18578 active+clean
             6339  active+remapped+backfill_wait
             142   active+remapped+backfilling
             6     active+clean+snaptrim
 
  io:
    client:   32 MiB/s rd, 247 MiB/s wr, 1.10k op/s rd, 1.57k op/s wr
    recovery: 4.2 GiB/s, 1.56k objects/s

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Is there a bug in backfill scheduling?

Reply via email to